The object of UAV target detection usually means small target with complicated backgrounds. In this paper, an object detection model M-YOLOv8s based on UAV aerial photography scene is proposed. Firstly, to solve the problem that the YOLOv8s model cannot adapt to small target detection, a small target detection head (STDH) module is introduced to fuse the location and appearance feature information of the shallow layers of the backbone network. Secondly, Inner-Wise intersection over union (Inner-WIoU) is designed as the boundary box regression loss, and auxiliary boundary calculation is used to accelerate the regression speed of the model. Thirdly, the structure of multi-scale feature pyramid network (MS-FPN) can effectively combine the shallow network information with the deep network information and improve the performance of the detection model. Furthermore, a multi-scale cross-spatial attention (MCSA) module is proposed to expand the feature space through multi-scale branch, and then achieves the aggregation of target features through cross-spatial interaction, which improves the ability of the model to extract target features. Finally, the experimental results show that our model does not only possess fewer parameters, but also the values of mAP0.5 are 6.6% and 5.4% higher than the baseline model on the Visdrone2019 validation dataset and test dataset, respectively. Then, as a conclusion, the M-YOLOv8s model achieves better detection performance than some existing ones, indicating that our proposed method can be more suitable for detecting the small targets.
Read full abstract