Object detection is a fundamental task in computer vision. Recently, deep-learning-based object detection has made significant progress. However, due to large variations in target scale, the predominance of small targets, and complex backgrounds in remote sensing imagery, remote sensing object detection still faces challenges, including low detection accuracy, poor real-time performance, high missed detection rates, and high false detection rates in practical applications. To enhance remote sensing target detection performance, this study proposes a new model, the remote sensing detection transformer (RS-DETR). First, we incorporate cascaded group attention (CGA) into the attention-driven feature interaction module. By capturing features at different levels, it enhances the interaction between features through cascading and improves computational efficiency. Additionally, we propose an enhanced bidirectional feature pyramid network (EBiFPN) to facilitate multi-scale feature fusion. By integrating features across multiple scales, it improves object detection accuracy and robustness. Finally, we propose a novel bounding box regression loss function, Focaler-GIoU, which makes the model focus more on difficult samples, improving detection performance for small and overlapping targets. Experimental results on the satellite imagery multi-vehicles dataset (SIMD) and the high-resolution remote sensing object detection (TGRS-HRRSD) dataset show that the improved algorithm achieved mean average precision (mAP) of 78.2% and 91.6% at an intersection over union threshold of 0.5, respectively, which is an improvement of 2.0% and 1.5% over the baseline model. This result demonstrates the effectiveness and robustness of our proposed method for remote sensing image object detection.
Read full abstract