Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

MRW-YOLO: a lightweight and high-precision network for small object detection in remote sensing images

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

ABSTRACT Object detection in remote sensing imagery faces significant challenges due to drastic scale variations, complex backgrounds, and the dense distribution of small targets. To address these, we propose a lightweight and high-precision detection network, termed MRW-YOLO, built upon the YOLOv8n architecture. Specifically, we restructure the network by proposing an Asymmetric Dual-branch Detection Head (AD-Head). This is achieved by first pruning the low-resolution P5 detection head to eliminate computational redundancy and sharpen the overall focus on small-scale features. Within the AD-Head, we asymmetrically enhance the two remaining branches: a Multi-branch Dilated Feature Aggregation (MDFA) module is integrated into the P3 branch to aggregate multi-scale contextual information via parallel dilated convolutions without compromising spatial resolution; meanwhile, a Receptive Field Attention Mechanism (RFAM) is embedded in the P4 branch to dynamically regulate the receptive field and suppress background noise. In terms of optimization, we employ the Weigh-CIoU loss function guided by a Distance-Prioritized Regression (DPR) strategy to improve bounding box regression by explicitly weighting centre-point deviations. Extensive experiments on the widely used NWPU VHR-10, RSOD, and DIOR datasets, as well as the challenging RS-STOD dataset for super-tiny objects, demonstrate that MRW-YOLO achieves superior detection performance compared to state-of-the-art methods. Notably, our model maintains an extremely low parameter count (less than 2 M), which validates its effectiveness for resource-constrained applications.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 34
  • 10.3390/rs15082096
A Multi-Feature Fusion and Attention Network for Multi-Scale Object Detection in Remote Sensing Images
  • Apr 16, 2023
  • Remote Sensing
  • Yong Cheng + 9 more

Accurate multi-scale object detection in remote sensing images poses a challenge due to the complexity of transferring deep features to shallow features among multi-scale objects. Therefore, this study developed a multi-feature fusion and attention network (MFANet) based on YOLOX. By reparameterizing the backbone, fusing multi-branch convolution and attention mechanisms, and optimizing the loss function, the MFANet strengthened the feature extraction of objects at different sizes and increased the detection accuracy. The ablation experiment was carried out on the NWPU VHR-10 dataset. Our results showed that the overall performance of the improved network was around 2.94% higher than the average performance of every single module. Based on the comparison experiments, the improved MFANet demonstrated a high mean average precision of 98.78% for 9 classes of objects in the NWPU VHR-10 10-class detection dataset and 94.91% for 11 classes in the DIOR 20-class detection dataset. Overall, MFANet achieved an mAP of 96.63% and 87.88% acting on the NWPU VHR-10 and DIOR datasets, respectively. This method can promote the development of multi-scale object detection in remote sensing images and has the potential to serve and expand intelligent system research in related fields such as object tracking, semantic segmentation, and scene understanding.

  • Research Article
  • Cite Count Icon 7
  • 10.1109/tgrs.2024.3454355
Adaptive Feature Separation Network for Remote Sensing Object Detection
  • Jan 1, 2024
  • IEEE Transactions on Geoscience and Remote Sensing
  • Wenping Ma + 6 more

With the development of remote sensing technology, remote sensing object detection has been widely applied in various fields, but it still faces some thorny challenges, such as the following: 1) the complexity of object scale changes in remote sensing images makes it difficult to improve the performance of small object detection and 2) remote sensing images have complex backgrounds and densely arranged small and weak objects, which pose a serious problem of feature interference. To alleviate these challenges, we propose an end-to-end adaptive feature separation network called AFSNet, which includes a scale-aware module (SAM) and a class-aware module (CAM). The SAM mainly enables feature maps of different resolutions to detect objects of different scales. Shallow feature maps mainly suppress the features of large objects they contain to focus on small object detection, while deep feature maps increase the detailed features of large objects they contain to focus on large object detection. The CAM is mainly used to distinguish the features in the feature map by category, separating the features of different categories into different channels, thus mitigating the problem of inter class feature interference, and blocking background interference. The effectiveness of this article has been proven on the NWPU VHR-10, IPIU-M, DIOR, and DOTA2.0 datasets. It can be widely applied in civilian, military, and other fields. Through experimental verification, our AFSNet achieved 97.70% mAP on the NWPU VHR-10 dataset, 78.9% mAP on the DIOR dataset, and 58.22% mAP on the DOTA2.0 dataset. Our code is available at: <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/Xidian-AIGroup190726/AFSNet</uri>.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/ijcnn48605.2020.9207217
Discriminative Feature Pyramid Network For Object Detection In Remote Sensing Images
  • Jul 1, 2020
  • Xiaoqian Zhu + 5 more

Multi-class geospatial object detection in remote sensing images suffer great challenges, such as large scales variability and complex background. Although feature pyramid network (FPN) can alleviate the problem of scale variation to some extent, it causes the loss of spatial and semantic information which is not conducive to object location. To address the above problem, this paper proposes a discriminative feature pyramid network (DFPN) by introducing a global guidance module (GGM) and a feature aggregation module (FAM). Specifically, the global guidance module delivers the high-level semantic information to lower layers, so as to obtain feature maps with stronger semantic information to eliminate the interference caused by complex background. The feature aggregation module enhances the interflow of information between different layers and better captures the discrimination information at each layer. We validate the effectiveness of our method on the NWPU VHR-10 and RSOD datasets, the results outperform baseline by 2.06 and 3.88 points respectively.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 30
  • 10.3390/rs14020427
Multiscale Object Detection in Remote Sensing Images Combined with Multi-Receptive-Field Features and Relation-Connected Attention
  • Jan 17, 2022
  • Remote Sensing
  • Jiahang Liu + 2 more

Object detection is an important task of remote sensing applications. In recent years, with the development of deep convolutional neural networks, object detection in remote sensing images has made great improvements. However, the large variation of object scales and complex scenarios will seriously affect the performance of the detectors. To solve these problems, a novel object detection algorithm based on multi-receptive-field features and relation-connected attention is proposed for remote sensing images to achieve more accurate detection results. Specifically, we propose a multi-receptive-field feature extraction module with dilated convolution to aggregate the context information of different receptive fields. This achieves a strong capability of feature representation, which can effectively adapt to the scale changes of objects, either due to various object scales or different resolutions. Then, a relation-connected attention module based on relation modeling is constructed to automatically select and refine the features, which combines global and local attention to make the features more discriminative and can effectively improve the robustness of the detector. We designed these two modules as plug-and-play blocks and integrated them into the framework of Faster R-CNN to verify our method. The experimental results on NWPU VHR-10 and HRSC2016 datasets demonstrate that these two modules can effectively improve the performance of basic deep CNNs, and the proposed method can achieve better results of multiscale object detection in complex backgrounds.

  • Research Article
  • 10.1016/j.rineng.2026.109726
FEMT-YOLO: Frequency-enhanced multi-scale network for small object detection in aerial images
  • Mar 1, 2026
  • Results in Engineering
  • Bingyu Cao + 3 more

FEMT-YOLO: Frequency-enhanced multi-scale network for small object detection in aerial images

  • Research Article
  • Cite Count Icon 4
  • 10.1080/01431161.2024.2343137
EB-Net: an efficient balanced network for accuracy and speed of remote sensing detection
  • May 6, 2024
  • International Journal of Remote Sensing
  • Dehua Zhang + 4 more

Remote sensing detection is a difficult task that requires not only identifying objects with complex background, quality, and angle issues, but also being lightweight enough to be carried by edge devices. In real remote sensing scenarios, achieving accurate, fast, and low-resource-consumption automated detection remains a significant challenge. Therefore, this paper proposes an efficient balanced network (EB-Net) for real remote sensing devices. First, a dynamic sparse attention (DSA) mechanism is proposed and has been proven with high performance via the complexity analysis. In addition, a new dynamic sparse transformer (DSFormer) is constructed using DSA, which enhances feature information and adapts to image resolution by self-attention and multi-headed attention, and achieves more flexible computation by random sampling. Then, three versions of discrete distribution IoU (DDIoU) are defined for adapting various scenarios and tasks in remote sensing, and this loss function makes the model achieve high accuracy and more lightweight. Finally, to make the model more lightweight, a cropped SPPF (CroSPPF) is presented, which significantly improves the computational efficiency by the lightweight of the sequence structure and activation function. Ablation experiments are conducted on the NWPU VHR-10 dataset, and demonstrate the effectiveness of the proposed methods. Numerous comparisons with state-of-the-art detectors are conducted on the NWPU VHR-10, RSOD and DOTA datasets. The experiments show that EB-Net outperforms state-of-the-art remote sensing detection models in comprehensive performance and achieves end-to-end accurate and lightweight detection.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 4
  • 10.3390/rs16162884
Skip-Encoder and Skip-Decoder for Detection Transformer in Optical Remote Sensing
  • Aug 7, 2024
  • Remote Sensing
  • Feifan Yang + 2 more

The transformer architecture is gradually gaining attention in remote sensing. Many algorithms related to this architecture have been proposed. However, the DEtection TRansformer (DETR) has been proposed as a new approach for implementing object detection tasks. It uses the transformer architecture for feature extraction, and its improved derivative models are uncommon in remote sensing object detection (RSOD). Hence, we selected the DETR with the improved deNoising anchor boxes (DINO) model as a foundation, upon which we have made improvements under the characteristics of remote sensing images (RSIs). Specifically, we proposed the skip-encoder (SE) module that can be applied to the encoder stage of the model and the skip-decoder (SD) module for the decoder stage. The SE module can enhance the model’s ability to extract multiscale features. The SD module can reduce computational complexity and maintain the model performance. The experimental results on the NWPU VHR-10 and DIOR datasets demonstrate that the SE and SD modules can improve DINO for better learning small- and medium-sized targets in RSIs. We achieved a mean average precision of 94.8% on the NWPU VHR-10 dataset and 75.6% on the DIOR dataset.

  • Research Article
  • Cite Count Icon 8
  • 10.1142/s021812662250147x
VC-YOLO: Towards Real-Time Object Detection in Aerial Images
  • Mar 7, 2022
  • Journal of Circuits, Systems and Computers
  • Bo Jiang + 3 more

Object detection for aerial images is a crucial and challenging task in the field of computer vision. Previous CNN-based methods face problems related to extreme variation of object scales and the complex background in aerial images, which vary significantly from natural scenes. On the other hand, a great many of existing detectors highly rely on computational performance and cannot handle real-time tasks. To address this problems, we propose a lightweight real-time object detection network which is named VC-YOLO. In the backbone part, we introduce a receptive field extended backbone with limited number of convolution layers to learn the features and context information of various objects. In the detection part, channel attention module and spatial attention module are used to generate discriminative feature representation. To make full use of semantic feature maps in backbone network, we improve the feature pyramid network (FPN) with more lateral connections to reuse the features in each convolution stage. We evaluate VC-YOLO on NWPU VHR-10 and VisDrone benchmark datasets. Experimental results show that VC-YOLO achieves superior detection accuracy with high efficiency compared with the existing methods.

  • Research Article
  • Cite Count Icon 19
  • 10.1109/access.2024.3479320
YOLO-Remote: An Object Detection Algorithm for Remote Sensing Targets
  • Jan 1, 2024
  • IEEE Access
  • Kaizhe Fan + 7 more

Unmanned Aerial Vehicles (UAVs) are indispensable in promoting the development of remote sensing technology. Nevertheless, the tasks of object recognition in remote sensing images based on UAV platforms face major difficulties and challenges due to the complex and variable background environments and the high-density distribution of objects. This paper proposes an object detection algorithm for UAV remote sensing images—YOLO-Remote, which aims to improve detection accuracy by enhancing YOLOv8. This algorithm innovatively integrates the SaElayer module to enhance the focus on remote sensing targets and improve network efficiency. Additionally, it introduces the Efficient-SPPF structure, which effectively expands the network’s receptive field and promotes deep learning capabilities. To address sample imbalance and improve bounding box localization and classification performance, the study also designs the Focaler-MDPIOU strategy. With these comprehensive optimizations, YOLO-Remote achieves significant progress in network architecture. Experiments were conducted on the NWPU VHR10 and RSOD datasets, and the experimental results show that compared to the base model YOLOv8n, the improved model’s average precision increased by 2.7% and 3.2% respectively, demonstrating its superiority in the field of object detection for UAV remote sensing images.The code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/QuincyQAQ/Yolo-Remotehttps://github.com/QuincyQAQ/Yolo-Remote</uri>.

  • Research Article
  • Cite Count Icon 1
  • 10.1109/jstars.2025.3582838
Text-Guided Distribution Calibration for Few-Shot Object Detection in Remote Sensing Images
  • Jan 1, 2025
  • IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
  • Yu Cao + 7 more

In recent years, few-shot object detection(FSOD) in remote sensing images(RSIs) has received increasing attention. However, due to the large difference in the number of labeled samples between the base classes and the novel classes, using only visual information for object detection will cause the features learned by the model to be biased towards the base classes, resulting in poor generalization ability in the novel classes with scarce labeled samples. In this paper, we propose a Text-guided distribution calibration network for few-shot object detection on RSIs. Considering the limited visual information of the novel classes, we propose a cross-modal knowledge transfer strategy, which aims to extract the corresponding text feature of the object class name through a multi-modal pre-training model CLIP and transfer the text knowledge to the FSOD model, to mitigate the feature bias problem. Following this idea, we design a text-guided distribution calibration module(TDCM), for each query image which utilizes the intra-image object class distribution defined by the text features to calibrate the object class distribution computed based on the visual features using a knowledge distillation loss for model training. By doing this, the cross-class transferable text knowledge can be transferred to regularize the learned visual features step siding bias on base classes and thus improve the generalization capacity. We conducted experiments on the NWPU VHR-10 and DIOR datasets and clarified the superior performance of the proposed method compared with several state-of-the-art comparison methods.

  • Research Article
  • Cite Count Icon 1
  • 10.7717/peerj-cs.1965
MBAN: multi-branch attention network for small object detection.
  • Mar 29, 2024
  • PeerJ Computer Science
  • Li Li + 3 more

Recent years small object detection has seen remarkable advancement. However, small objects are difficult to accurately detect in complex scenes due to their low resolution. The downsampling operation inevitably leads to the loss of information for small objects. In order to solve these issues, this article proposes a novel Multi-branch Attention Network (MBAN) to improve the detection performance of small objects. Firstly, an innovative Multi-branch Attention Module (MBAM) is proposed, which consists of two parts, i.e. Multi-branch structure consisting of convolution and maxpooling, and the parameter-free SimAM attention mechanism. By combining these two parts, the number of network parameters is reduced, the information loss of small objects is reduced, and the representation of small object features is enhanced. Furthermore, to systematically solve the problem of small object localization, a pre-processing method called Adaptive Clustering Relocation (ACR) is proposed. To validate our network, we conducted extensive experiments on two benchmark datasets, i.e. NWPU VHR-10 and PASCAL VOC. The findings from the experiment demonstrates the significant performance gains of MBAN over most existing algorithms, the mAP of MBAN achieved 96.55% and 84.96% on NWPU VHR-10 and PASCAL VOC datasets, respectively, which proves that MBAN has significant performance in small object detection.

  • Research Article
  • Cite Count Icon 2
  • 10.3390/rs15215200
A Novel Adaptive Edge Aggregation and Multiscale Feature Interaction Detector for Object Detection in Remote Sensing Images
  • Nov 1, 2023
  • Remote Sensing
  • Wei Huang + 4 more

Object detection (OD) in remote sensing (RS) images is an important task in the field of computer vision. OD techniques have achieved impressive advances in recent years. However, complex background interference, large-scale variations, and dense instances pose significant challenges for OD. These challenges may lead to misalignment between features extracted by OD models and the features of real objects. To address these challenges, we explore a novel single-stage detection framework for the adaptive fusion of multiscale features and propose a novel adaptive edge aggregation and multiscale feature interaction detector (AEAMFI-Det) for OD in RS images. AEAMFI-Det consists of an adaptive edge aggregation (AEA) module, a feature enhancement module (FEM) embedded in a context-aware cross-attention feature pyramid network (2CA-FPN), and a pyramid squeeze attention (PSA) module. The AEA module employs an edge enhancement mechanism to guide the network to learn spatial multiscale nonlocal dependencies and solve the problem of feature misalignment between the network’s focus and the real object. The 2CA-FPN employs level-by-level feature fusion to enhance multiscale feature interactions and effectively mitigate the misalignment between the scales of the extracted features and the scales of real objects. The FEM is designed to capture the local and nonlocal contexts as auxiliary information to enhance the feature representation of information interaction between multiscale features in a cross-attention manner. We introduce the PSA module to establish long-term dependencies between multiscale spaces and channels for better interdependency refinement. Experimental results obtained using the NWPU VHR-10 and DIOR datasets demonstrate the superior performance of AEAMFI-Det in object classification and localization.

  • Research Article
  • Cite Count Icon 1
  • 10.14358/pers.25-00002r4
Global Multi-Scale Fusion Self-Calibration Network for Remote Sensing Object Detection
  • Oct 1, 2025
  • Photogrammetric Engineering &amp; Remote Sensing
  • Yan Chen + 7 more

Applications of remote sensing images in both defense and civilian sectors have spurred substantial research interest. In the field of remote sensing, object detection confronts challenges such as complex backgrounds, scale diversity, and the presence of dense small objects. To address these issues, we propose an improved deep learning-based model, the Global Multi-scale Fusion Self-calibration Network, which is expected to contribute to alleviating the challenges. It consists of three main components: the hierarchical feature aggregation backbone, which uses improved modules such as the receptive field context-aware feature extraction module, the global information acquisition module, and the simple parameter-free attention module to extract key features and minimize the background interference. To couple multi-scale features, we enhanced the fusing component and designed the multi-scale enhanced pyramid structure integrating the proposed new modules. During the detection phase, especially when focusing on small object detection, we designed a novel convolutional attention feature fusion head. This head is constructed to integrate local and global branches for feature extraction by leveraging channel shuffling and multi-head attention mechanisms for efficient and accurate detection. Experiments on the Detection in Optical Remote Sensing Images (DIOR), Northwestern Polytechnical University Very High-Resolution‐10 (NWPU VHR‐10), remote sensing object detection (RSOD), and DOTAv1.0 data sets show that our method achieves mAP50(mean average precision at 50% intersection over union) of 69.7%, 91.3%, 94.2%, and 70.0%, respectively, outperforming existing comparative methods. The proposed network is expected to provide new perspectives for remote sensing tasks and possible solutions for relevant applications in the image domain.

  • Research Article
  • 10.3390/a18120751
Enhanced Remote Sensing Object Detection via AFDNet: Integrating Dual-Sensing Attention and Dynamic Bounding Box Optimization
  • Nov 28, 2025
  • Algorithms
  • Ziyan Wang + 2 more

Existing remote sensing object detection methods struggle with challenges such as complex background interference, variable object scales, and class imbalance due to a lack of coordinated internal optimization. This paper proposes AFDNet, a novel RSOD algorithm that establishes an internal collaborative evolution mechanism to systematically enhance the model’s feature perception and localization capabilities in complex scenes. AFDNet achieves this through three tightly coupled, co-evolving components: (1) A channel–spatial dual-sensing module that adaptively focuses on crucial features and suppresses background noise. (2) A dynamic bounding box optimization module that integrates distance-aware and scale-normalization strategies, significantly boosting localization accuracy and regression robustness for multi-scale objects. (3) A Gaussian adaptive activation unit that enhances the model’s nonlinear fitting capability for better detail extraction under weak conditions. Extensive experiments on two public datasets, RSOD and NWPU VHR-10, verify the excellent performance of AFDNet. AFDNet achieved a leading 95.16% mAP@50 on the RSOD dataset and an astonishing 96.52% mAP@50 on the NWPU VHR-10 dataset, which is significantly better than the mainstream detection models. This study verifies the effectiveness of introducing internal co-evolution mechanisms and provides a novel and reliable solution for high-precision remote sensing target detection.

  • Research Article
  • Cite Count Icon 70
  • 10.1109/tgrs.2023.3346041
Attention-Free Global Multiscale Fusion Network for Remote Sensing Object Detection
  • Jan 1, 2024
  • IEEE Transactions on Geoscience and Remote Sensing
  • Tao Gao + 5 more

Remote sensing object detection (RSOD) encounters challenges in complex backgrounds and small object detection, which are interconnected and unable to address separately. To this end, we propose an attention-free global multiscale fusion network (AGMF-Net). Initially, we present a spatial bias module (SBM) to obtain long-range dependencies as a part of our proposal global information extraction module (GIEM). GIEM efficiently captures the global information, overcoming challenges posed by complex backgrounds. Moreover, we propose multitask enhanced structure (MES) and multitask feature pretreatment (MFP) to enhance the feature representation of multiscale targets, while eliminating the interference from complex backgrounds. In addition, an efficient context decoupled detector (ECDD) is presented to provide distinct features for regression and classification tasks, aiming to improve the efficiency of RSOD. Extensive experiments demonstrate that our proposed method achieves superior performance compared with the state-of-the-art detectors. Specifically, AGMF-Net obtains the mean average precision (mAP) of 73.2%, 92.03%, 95.21%, and 94.30% on detection in optical remote sensing images (DIOR), high resolution remote sensing detection (HRRSD), Northwestern Polytechnical University Very High Resolution-10 (NWPU VHR-10), and RSOD datasets, respectively.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant