Detection of small objects in Unmanned Aerial Vehicle (UAV) images, which contain a large number of objects with extremely small pixels, has been a major challenge over the last few decades. Owing to the lack of sufficiently detailed representation of object features, conventional detectors based on deep learning are unsatisfactory when detecting small objects. Meanwhile, the flight altitude and the shooting angle of the drone always changing, which leads to an uneven dispersion of the multiple objects and a varying density of small objects. In response to the issues mentioned above, a novel lightweight Cross-layer Triple-branch Parallel Fusion Network (CTPFNet) is proposed to improve the real-time detection accuracy of small objects under UAV images. Firstly, a novel downsampling structure called the Inverse Residual Pooling Cascade (IRPC) module is proposed to obtain richer feature information about small objects. Secondly, we construct an improved global feature extraction structure Efficient Layer Aggregation Networks-Transformer (ELAN-Trans) to enhance the association between global features. Then, we design the Hybrid Dilated Depthwise Separable-Spatial Pyramid Pooling-Fast (HDDS-SPPF) to capture more contextual information by using the dilated convolution operations. To enhance the cross-scale transfer fusion of low-level original features containing more detailed localization information and deep-level deep features containing more semantic information, we propose a CTPF module embedded into the neck region for the secondary feature reuse of adjacent feature maps with varying resolutions. Extensive experiments on the public VisDrone2021-DET dataset show that the proposed model achieves significant performance gains with fewer parameters.
Read full abstract