With the rapid development of the UAV industry, object detection using UAV has become a research hotspot. However, most current object detection models based on deep learning have large parameter counts and are difficult to deploy on embedded devices with limited memory and computational power. To address this problem, a Small Object Detection network for UAV aerial images SOD‐YOLO based on YOLOv8 is proposed, which can meet the application requirements of resource‐constrained devices while ensuring the detection accuracy. First, cross‐domain fusion attention (CDFA) mechanism is proposed to build the C2f‐Attention module in this paper, which is embedded in the backbone network in order to improve the extraction capability of key object features. Meanwhile, the AIFI_LSPE feature fusion module with improved RT‐DETR and the IoU‐aware query selection mechanism are added to the path aggregation network to improve the accuracy of multi‐scale object detection. In addition, in order to balance the sample size ratio and improve the robustness of the network model, we make a new UAV image dataset named VisDrone2019 Extended Edition (VDEE) using images from the VisDrone2019 and UAVDT public datasets. Finally, Shape‐IoU is used as a loss function to reduce the difference between the object GT frame and the detection frame. Experiments show that SOD‐YOLO has a mAP@0.5 of 42.8% in the VDEE dataset, which is increased by 5.1% over YOLOv8. In the VisDrone2019 dataset mAP@0.5 is 39.2%, an improvement of 5.8% over YOLOv8. © 2024 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.
Read full abstract