AbstractObject detection in remote sensing images aims to interpret images to obtain information on the category and location of potential targets, which is of great importance in traffic detection, marine supervision, and space reconnaissance. However, the complex backgrounds and large scale variations in remote sensing images present significant challenges. Traditional methods relied mainly on image filtering or feature descriptor methods to extract features, resulting in underperformance. Deep learning methods, especially one‐stage detectors, for example, the Real‐Time Object Detector (RTMDet) offers advanced solutions with efficient network architectures. Nevertheless, difficulty in feature extraction from complex backgrounds and target localisation in scale variations images limits detection accuracy. In this paper, an improved detector based on RTMDet, called the Multi‐Scale Feature Extraction‐assist RTMDet (MRTMDet), is proposed which address limitations through enhancement feature extraction and fusion networks. At the core of MRTMDet is a new backbone network MobileViT++ and a feature fusion network SFC‐FPN, which enhances the model's ability to capture global and multi‐scale features by carefully designing a hybrid feature processing unit of CNN and a transformer based on vision transformer (ViT) and poly‐scale convolution (PSConv), respectively. The experiment in DIOR‐R demonstrated that MRTMDet achieves competitive performance of 62.2% mAP, balancing precision with a lightweight design.
Read full abstract