Abstract

Convolutional neural networks (CNNs) have achieved remarkable performance in various computer vision tasks, including object detection. However, for object detection from unmanned aerial vehicles (UAVs), which is a complex task with complex backgrounds and limited resources, challenges (e.g., information losses for edges and small objects and high computational costs) exist. To address these issues, we propose a CNN building block called reparameterized fusion convolution (RFConv), which incorporates multiscale convolution branches to capture small object information and expand the receptive field. During inference, reparameterization reduces the computational overhead by pruning. Moreover, we discover that convolution and self-attention, two powerful techniques, exhibit design paradigm differences, but many of the computations are actually accomplished through similar operations. Consequently, we propose a hybrid module to enable parameter reuse and information flow between convolution and self-attention, harnessing the advantages of both methods with minimal computational costs for detecting edges and small objects. Based on different combinations of the hybrid module and RFConv, we design a diverse multiscale leapfrog structure (MLS) to satisfy various usage requirements. Additionally, we propose a variability boundary activation function (VB) that can reuse network information to adaptively adjust the nonlinearity and gradient characteristics, effectively addressing the distinct activation function requirements of convolution and self-attention. We incorporated our proposed method into YOLOv5s, achieving 95.46 AP0.5 and 27.3 AP0.5 on UAV datasets (NWPU VHR-10 and VisDrone2019), and into YOLOv5m, obtaining 92.58 mAP on the general dataset (PASCAL VOC), to demonstrate the ability of our method to enhance the effectiveness of existing detectors.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.