Abstract

The deep learning method, especially convolution neural networks (CNNs), has recently made ground-breaking advances on object detection in very-high-resolution (VHR) optical remote sensing images. However, as CNN is originally designed for the classification of natural images, these methods are not very suitable for object detection of remote sensing images. First, current CNN-based approaches have difficulty to deal with objects that have large rotation variation, which is widely existed in optical remote sensing images. Second, the detectors based on CNN only have limited receptive fields and, thus, can hardly utilize the global contextual information that is essential for accurate detection of small targets. To address these two issues, this article proposes a novel deep learning-based object detection framework, including a geometric transform module (GTM) and a global contextual feature fusion module (GCFM). Especially, the GTM combines rotation and flip transformation to deal with the multiangle characteristics of objects. The GCFM uses a spatial attention mechanism to adaptively involve global contextual information in feature maps to improve the recognition and location accuracy of targets. We introduce the two modules into the YOLOv3 framework to achieve end-to-end detection with high performance and efficiency. Comprehensive evaluations on three publicly available object detection data sets demonstrate the excellent performance of the proposed methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call