With the rapid development of unmanned aerial vehicle (UAV) technology, there is an urgent need for high-performance aerial object detection algorithms that are tailored for deployment on drones with limited computing capabilities. This paper proposes a series of improvements to the state-of-the-art YOLOv8 object detector to enhance its detection accuracy and speed for small, partially occluded objects in complex environments. Specifically, we introduce multi-scale feature fusion through additional detection layers, employ conditionally parameterized convolutions to increase representational capacity, and import a dynamic non-monotonic loss function named Wise-IoU to enable more effective regression of bounding boxes. Experiments conducted on the large-scale UAV benchmark dataset VisDrone demonstrate that the improved model achieves state-of-the-art accuracy of 37.6% mAP with 3 M parameters, outperforming other lightweight YOLO detectors and two-stage detectors like Faster R-CNN. The improved model also reaches 40 FPS on an embedded edge device, validating its efficiency and suitability for real-time UAV applications. Through comprehensive quantitative experiments and visual results, this work provides valuable insights and techniques to tailor object detection algorithms for robust and efficient deployment on UAVs with limited onboard computing power.
Read full abstract