Abstract

UAV sampling can not only adapt to various complex terrain environments but also provide a broader vision. However, images captured by UAVs usually contain complex backgrounds and a large number of small objects. This poses a significant challenge to some existing advanced object detectors. Moreover, some existing state-of-the-art lightweight detectors have too many parameters and computational overheads, which are not friendly to lightweight devices. Responding to the above issues, we propose a single-stage detector named features enhancement and shift lightweight network in this work. Firstly, a lightweight adjust convolution is proposed, which unfolds the features and encodes the 3 × 3 background information into information-rich 1×1 features by averaging the pooling and convolution layers, which efficiently enhances the representation of 1 × 1 convolutional extracted features. Next, to efficiently suppress complex background information, we propose a three-dimensions attention module, which interacts information on the C-W, C-H and H-W dimensions in a unique way to obtain three efficient attention maps that highlight important information to weaken irrelevant information. Moreover, we create a novel receptive-field feature enhancement convolution, which unfolds the features and then interacts the 3 × 3 features to obtain weighted weights. The 3 × 3 convolution combining weighted features becomes parametric unshared convolution in principle, which enhances the ability to capture detailed information. Finally, in order to retain richer object and semantic information, we carefully analyze the down-sampling convolution and propose a feature shift down-sampling convolution. Then we combine it and improve Neck to get a new lightweight Neck. Furthermore, experiments on the VisDrone-DET2021 dataset show that our method obtained 36.21% on mAP50, which is 9.78% higher than the baseline model YOLOv5n. Meanwhile, compared with the advanced lightweight networks YOLOX-tiny, YOLOv6n, YOLOv7-tiny, and YOLOv8n, our network achieves superior detection results using fewer number of parameters. We also compare our network with the latest networks trained on images captured by UAVs, and experimentally demonstrate that our network achieves excellent performance using only 1.7 M parameters and 8.3 GFLOPS.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call