Currently, lightweight small object detection algorithms for unmanned aerial vehicles (UAVs) often employ group convolutions, resulting in high Memory Access Cost (MAC) and rendering them unsuitable for edge devices that rely on parallel computing. To address this issue, we propose the SOD-YOLO model based on YOLOv7, which incorporates a DSDM-LFIM backbone network and includes a small object detection branch. The DSDM-LFIM backbone network, which combines Deep-Shallow Downsampling Modules (DSD Modules) and Lightweight Feature Integration Modules (LFI Modules), avoids excessive use of group convolutions and element-wise operations. The DSD Module focuses on extracting both deep and shallow features from feature maps using fewer parameters to obtain richer feature representations. The LFI Module, is a dual-branch feature integration module designed to consolidate feature information. Experimental results demonstrate that the SOD-YOLO model achieves an AP50 of 50.7% and a FPS of 72.5 on the VisDrone validation set. Compared to YOLOv7, our model reduces computational costs by 20.25% and decreases the number of parameters by 17.89%. After scaling the number of channels in the model, it achieves an AP50 of 33.4% with an inference time of 27.3ms on the Atlas 200I DK A2. These experimental results indicate that the SOD-YOLO model can effectively perform small object detection tasks in a large number of aerial images captured by UAVs.
Read full abstract