Abstract
Abstract With the growing use of unmanned aerial vehicles (UAVs), it poses a potential threat to public safety, so drone detection has become an increasingly important research direction. However, the irregular shooting angles, deformations, significant target scale variations, and complex backgrounds in UAV imagery pose challenges for existing object detection models. To address these issues, we propose ALDNet, a precise and lightweight UAV image object detection network built on an improved RT-DETR model. First, we introduce the Position Accurate Continuous Convolution (PAC), which dynamically adjusts convolution kernel coordinates, allowing flexibility in detecting objects of varied shapes. PAC improve the ability of backbone network to capture objects with different shapes. Second, we present an enhanced multi-scale feature fusion (EMF) approach that integrates spatial information more effectively, reducing the loss of fine-grained details during fusion. Additionally, we propose the Inner-GIOU loss function, which reduces interference and accelerates the convergence of bounding box optimisation, thus improving the overall detection performance. We conducted experiments on three UAV image datasets, Det-Fly, ARD-MAV and VisDrone, and compared the ALDNet results with the RT-DETR model. The experimental results show that ALDNet achieved 98.3\%, 97.4\% and 48.1\% mAP50 values on Det-Fly, ARD-MAV and VisDrone, which are 1.5\%, 0.4\% and 1.9\% higher than RT-DETR, respectively, and the number of parameters and computation amount of ALDNet are 25\% and 14\% less than the baseline, respectively. Experimental results show that ALDNet achieves greater efficiency and effectiveness with less computational power.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have