Abstract

Convolutional neural networks (CNNs) have demonstrated their ability object detection of very high resolution remote sensing images. However, CNNs have obvious limitations for modeling geometric variations in remote sensing targets. In this paper, we introduced a CNN structure, namely deformable ConvNet, to address geometric modeling in object recognition. By adding offsets to the convolution layers, feature mapping of CNN can be applied to unfixed locations, enhancing CNNs’ visual appearance understanding. In our work, a deformable region-based fully convolutional networks (R-FCN) was constructed by substituting the regular convolution layer with a deformable convolution layer. To efficiently use this deformable convolutional neural network (ConvNet), a training mechanism is developed in our work. We first set the pre-trained R-FCN natural image model as the default network parameters in deformable R-FCN. Then, this deformable ConvNet was fine-tuned on very high resolution (VHR) remote sensing images. To remedy the increase in lines like false region proposals, we developed aspect ratio constrained non maximum suppression (arcNMS). The precision of deformable ConvNet for detecting objects was then improved. An end-to-end approach was then developed by combining deformable R-FCN, a smart fine-tuning strategy and aspect ratio constrained NMS. The developed method was better than a state-of-the-art benchmark in object detection without data augmentation.

Highlights

  • Object detection is one of the main tasks in remote sensing

  • We focused on the top three object detection methods outlined in Table 1: deformable Region-based Fully Convolutional Networks (R-FCN) with aspect ratio constrained non maximum suppression (arcNMS), deformable R-FCN and R-P-Faster R-Convolutional neural networks (CNNs) trained on VGG16 model in single fine-tuning mode

  • An end-to-end deformable convolutional neural network structure is presented for modeling geometric variations in very high resolution (VHR) remote sensing objects

Read more

Summary

Introduction

Object detection is one of the main tasks in remote sensing. Development of very high resolution remote sensing images provide us more detailed geo-spatial objects information, including diversities in scale, orientation and shape. The traditional object detection framework has three major components: feature representation, classification and localization. Researchers have modeled geometric variations with two methods, both based on feature representation. The first method involves adding geometric priors in training samples, which is usually completed by manually rotating or performing transformation the training objects in two-dimensional (2D) or three-dimensional (3D) space [5]. The second method involves extracting transform invariance features. By finding scale-invariant feature transform descriptors, researchers created scaleinvariant feature transform (SIFT) [6] and histograms of oriented gradients (HOG) [7], which are widely used in computer vision and other image related areas

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.