Abstract

Object detection in aerial images is vital for autonomous guidance, navigation and control, and situational awareness. However, there are still many challenges facing researchers in this filed, including the target scales, the perspectives in taking pictures, and the highly complex background. The present paper introduces a robust object detector which is optimized for handling with multi-scale objects and the overhead capturing perspective object instances in aerial images. Firstly, in the feature extraction stage, an effective multi-scale detector (MSD) is designed to search for objects with different scales in feature maps. After that, when detecting a small target from a cluttered background, both the shallow and deep layer features are densely connected by the deconvolution after tackling the issues of low dimensionality in deep layers and inadequate representation of small objects. In the experiments part, we analyze the impacts of the above mentioned components on the model and make a comparison between the method at issue and other state-of-the-art approaches on two publicly-available datasets captured by satellites and high-altitude UAVs. The results show that the proposed method, which is applicable to a wider range of aerial images, is more effective and robust.

Highlights

  • As airborne cameras and remote sensing systems keep developing, it is more and more common for high-resolution aerial images that are captured by unmanned airborne vehicles (UAVs) and satellites to provide data for researchers

  • Compared with the highest accuracy of other methods, the proposed method can improve the performance of mean average precision (mAP) for DOTA-v1.5 by 19.79%, and the accuracy of the proposed method with multi-scale detector (MSD) structure is 13.56% higher than that without MSD structure

  • In order to overcome these shortages caused by multi-scale targets especially small instances, a robust object detector based on the deep neural network is proposed

Read more

Summary

INTRODUCTION

As airborne cameras and remote sensing systems keep developing, it is more and more common for high-resolution aerial images that are captured by unmanned airborne vehicles (UAVs) and satellites to provide data for researchers. COCO and VOC are general object detection datasets captured in natural scenes, widely used to evaluate the performance of object detection models Most images in these datasets are shot in the horizontal direction and a close range. The images in this dataset are basically shot from an overlook view, and the instances are relatively small. These backbone networks contain the traditional convolution and pooling operation, which are widely used for the preliminary feature extraction in the typical CNN model.

RELATED WORK
MATERIALS AND METHODS
LOSS FUNCTION
Findings
CONCLUSION AND FUTURE RESEARCH

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.