Abstract

Unmanned aerial vehicles (UAVs) play an important role in conducting automatic patrol inspections of cities, which can ensure the safety of urban residents’ life and property and the normal operation of cities. However, during the inspection process, problems may arise. For example, numerous small objects in UAV images are difficult to detect, objects in UAV images are severely occluded and requirements for real-time performances are posed. To address these issues, we first propose a real-time object detection network (RTD-Net) for UAV images. Besides, to deal with the lack of visual features of small objects, we design a feature fusion module (FFM) to interact and fuse features at different levels and improve the feature expression ability of small objects. To achieve real-time detection, we design a lightweight feature extraction module (LEM) to build the backbone network to control the calculation quantity and parameters. To solve the issue of discontinuous features of occluded objects, an efficient convolutional Transformer block (ECTB) based convolutional multi-head self-attention (CMHSA) is designed to improve the recognition ability of occluded objects by extracting the context information of objects. Compared with multi-head self-attention (MHSA) in the traditional transformer, CMHSA uses convolutional projection to replace the position-linear projection, which can reduce a large amount of calculation without performance loss. Finally, an attention prediction head (APH) is designed based on the attention mechanism to improve the ability of the model to extract attention regions in complex scenarios. The proposed method reaches a detection accuracy of 86.4% mAP in our UAV image dataset. In addition, it achieves a detection accuracy of 86.0%mAP and a detection speed of 33.4 FPS in NVIDIA Jeston TX2 embedded device.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.