Abstract
Object detection on very-high-resolution (VHR) remote sensing imagery has attracted a lot of attention in the field of image automatic interpretation. Region-based convolutional neural networks (CNNs) have been vastly promoted in this domain, which first generate candidate regions and then accurately classify and locate the objects existing in these regions. However, the overlarge images, the complex image backgrounds and the uneven size and quantity distribution of training samples make the detection tasks more challenging, especially for small and dense objects. To solve these problems, an effective region-based VHR remote sensing imagery object detection framework named Double Multi-scale Feature Pyramid Network (DM-FPN) was proposed in this paper, which utilizes inherent multi-scale pyramidal features and combines the strong-semantic, low-resolution features and the weak-semantic, high-resolution features simultaneously. DM-FPN consists of a multi-scale region proposal network and a multi-scale object detection network, these two modules share convolutional layers and can be trained end-to-end. We proposed several multi-scale training strategies to increase the diversity of training data and overcome the size restrictions of the input images. We also proposed multi-scale inference and adaptive categorical non-maximum suppression (ACNMS) strategies to promote detection performance, especially for small and dense objects. Extensive experiments and comprehensive evaluations on large-scale DOTA dataset demonstrate the effectiveness of the proposed framework, which achieves mean average precision (mAP) value of 0.7927 on validation dataset and the best mAP value of 0.793 on testing dataset.
Highlights
Object detection on very-high-resolution (VHR) optical remote sensing imagery has attracted more and more attention
Each scale is the pixel size of a patch’s shortest side and the network uniformly select a scale for each training sample at random
In the previous multi-class object detection works [3,4,33], the non-maximum uppression (NMS) thresholds for different categories are the same, but we find that different NMS thresholds for different categories based on the category intensity (CI) can improve the accuracy of object detection to a certain extent
Summary
Object detection on very-high-resolution (VHR) optical remote sensing imagery has attracted more and more attention. They first extract the features (e.g., histogram of oriented gradients (HOG) [15], bag of words (BoW) [16], Sparse representation (SR)-based features [17], etc.) of the object, perform feature fusion and dimension reduction to concisely extract features Those features are fed into a classifier (e.g., Support vector machine (SVM) [18], AdaBoost [19], Conditional random field (CRF) [20], etc.) trained with a large amount of data for object detection. Those methods rely on the hand-engineered features, they are difficult to efficiently process remote sensing images in the context of big data. The hand-engineered features can only detect specific targets, when applying them to other objects, the detection results are unsatisfactory [1]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.