Geospatial Object Detection in High Resolution Satellite Images Based on Multi-Scale Convolutional Neural Network

Wei Guo,Guang Hua,Wen Yang,Haijian Zhang

doi:10.3390/rs10010131

Abstract

Daily acquisition of large amounts of aerial and satellite images has facilitated subsequent automatic interpretations of these images. One such interpretation is object detection. Despite the great progress made in this domain, the detection of multi-scale objects, especially small objects in high resolution satellite (HRS) images, has not been adequately explored. As a result, the detection performance turns out to be poor. To address this problem, we first propose a unified multi-scale convolutional neural network (CNN) for geospatial object detection in HRS images. It consists of a multi-scale object proposal network and a multi-scale object detection network, both of which share a multi-scale base network. The base network can produce feature maps with different receptive fields to be responsible for objects with different scales. Then, we use the multi-scale object proposal network to generate high quality object proposals from the feature maps. Finally, we use these object proposals with the multi-scale object detection network to train a good object detector. Comprehensive evaluations on a publicly available remote sensing object detection dataset and comparisons with several state-of-the-art approaches demonstrate the effectiveness of the presented method. The proposed method achieves the best mean average precision (mAP) value of 89.6%, runs at 10 frames per second (FPS) on a GTX 1080Ti GPU.

Highlights

The rapid development of remote sensing technologies has created a large amount of high-quality satellite and aerial images for research and investigation
If the area overlap ratio between several detecting anchor boxes and the ground truth are bigger than 0.5, only the bounding box with the largest area IoU is considered as true positive (TP), others are considered as false positive (FP)
To evaluate the proposed multi-scale convolutional neural network (CNN) quantitatively, we compared it with three state-of-the-art methods and four state-of-the-art CNN-based methods: (1) the bag of words (BoW) feature based method in which each image region is represented as a histogram of visual words generated by the k-means algorithm [57]; (2) the spatial sparse coding BoW (SSCBoW) feature based model in which visual words are generated by the sparse coding algorithm [36]; (3) the collection of part detectors (COPD) based method which is composed of 45 seed-based part detectors trained in histogram of oriented gradients (HOG) feature space

Summary

Introduction

The rapid development of remote sensing technologies has created a large amount of high-quality satellite and aerial images for research and investigation. Automated object detection in HRS images is a core requirement for large range scene understanding and semantic information extraction [2]. Considerable efforts have been made to develop various methods for the detection of different types of objects in satellite and aerial images [3], such as buildings [4,5], storage tanks [6,7], vehicles [8,9], and airplanes [10,11,12]. To solve the object detection problem, the traditional methods based on either coding of handcrafted features or unsupervised feature learning can only generate shallow to middle features with limited representative ability [14,15]. With the rapid development of convolutional neural network (CNN), several design variations using region based CNN have generated the state-of-the-art performance against

Methods

Results

Discussion

Conclusion