Abstract

Remote sensing imagery typically comprises successive background contexts and complex objects. Global average pooling is a popular choice to connect the convolutional and fully connected (FC) layers for the deep convolution network. This article equips the networks with another pooling strategy, namely the deep object-centric pooling (DOCP), to pool convolutional features considering the location of an object within the scene image. The proposed DOCP network structure consists of the following two steps: inferring object's location and separately pooling the foreground and background features to generate an object-level representation. Specifically, a spatial context module is presented to learn the location of the object of interest in the scene image. Then, the convolutional feature maps are pooled separately in the foreground and background of the object. Finally, the FC layer concatenates these pooled features and is followed by a batch normalization layer, a dropout layer, and a softmax layer. Two challenging datasets are employed to validate our approach. The experimental results demonstrate that the proposed DOCP-net can outperform the corresponding pooling methods and achieve a better classification performance than other pretrained convolutional neural network-based scene classification methods.

Highlights

  • T HE rapid development of remote sensing (RS) technology is conducive to the numerous satellite images with high-spatial and high-spectral resolution benefiting from the advancements in image acquisition equipment [1], [2], [3], [4]

  • Spurred by the deficiencies of current pooling methods and inspired by the object-centric spatial pooling [19] in image classification, this paper proposes a deep object-centric pooling (DOCP) approach to derive location information and improve the RS scene classification performance

  • The DOCP method has a similar performance with DDRL-AM [59], that involves fine-tuning the Convolutional neural networks (CNNs) model, and fusing the feature maps extracted from the CNN model with a spatial feature transformer model

Read more

Summary

INTRODUCTION

T HE rapid development of remote sensing (RS) technology is conducive to the numerous satellite images with high-spatial and high-spectral resolution benefiting from the advancements in image acquisition equipment [1], [2], [3], [4]. Unlike the objectcentric spatial pooling that adopts low-level descriptors, the proposed method utilizes the convolutional feature maps for scene classification. Literature offers several works on object category localization learning in a weakly supervised manner (only using the image-level class label) [20], [21], [22], [23] These methods are designed for detection tasks and embedding them into a NN for RS scene classification is not trivial. As the global average (GA) pooling layer enables a CNN trained on image-level labels to have localization ability [24] (Fig. 3), a spatial context module is proposed to obtain the approximate location information of objects in scene images. The DOCP networks extending several pre-trained deep CNN models achieve promising results on two benchmark data sets

Pooling processes in CNN
Weakly Supervised Localization
PROPOSED METHOD
Architecture overview
Deep Object-Centric Pooling
Experimental data sets
Hyperparameter Setting
Effect of combination of the DOCP and GA pooling layers
Compared With Other Pre-trained CNN-Based Methods
Method
Qualitative Visualization and Analysis
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call