The progress of object detection technology is crucial for obtaining extensive scene information from aerial perspectives based on computer vision. However, aerial image detection presents many challenges, such as large image background sizes, small object sizes, and dense distributions. This research addresses the specific challenges relating to small object detection in aerial images and proposes an improved YOLOv8s-based detector named Aerial Images Detector-YOLO(AID-YOLO). Specifically, this study adopts the General Efficient Layer Aggregation Network (GELAN) from YOLOv9 as a reference and designs a four-branch skip-layer connection and split operation module Re-parameterization-Net with Cross-Stage Partial CSP and Efficient Layer Aggregation Networks (RepNCSPELAN4) to achieve a lightweight network while capturing richer feature information. To fuse multi-scale features and focus more on the target detection regions, a new multi-channel feature extraction module named Convolutional Block Attention Module with Two Convolutions Efficient Layer Aggregation Net-works (C2FCBAM) is designed in the neck part of the network. In addition, to reduce the sensitivity to position bias of small objects, a new function, Normalized Weighted Distance Complete Intersection over Union (NWD-CIoU_Loss) weight adaptive loss function, was designed in this study. We evaluate the proposed AID-YOLO method through ablation experiments and comparisons with other advanced models on the VEDAI (512, 1024) and DOTAv1.0 datasets. The results show that compared to the Yolov8s baseline model, AID-YOLO improves the mAP@0.5 metric by 7.36% on the VEDAI dataset. Simultaneously, the parameters are reduced by 31.7%, achieving a good balance between accuracy and parameter quantity. The Average Precision (AP) for small objects has improved by 8.9% compared to the baseline model (YOLOv8s), making it one of the top performers among all compared models. Furthermore, the FPS metric is also well-suited for real-time detection in aerial image scenarios. The AID-YOLO method also demonstrates excellent performance on infrared images in the VEDAI1024 (IR) dataset, with a 2.9% improvement in the mAP@0.5 metric. We further validate the superior detection and generalization performance of AID-YOLO in multi-modal and multi-task scenarios through comparisons with other methods on different resolution images, SODA-A and the DOTAv1.0 datasets. In summary, the results of this study confirm that the AID-YOLO method significantly improves model detection performance while maintaining a reduced number of parameters, making it applicable to practical engineering tasks in aerial image object detection.
Read full abstract