Looking Closer at the Scene: Multiscale Representation Learning for Remote Sensing Image Scene Classification.

Qi Wang,Wei Huang,Xuelong Li,Zhitong Xiong

doi:10.1109/tnnls.2020.3042276

Abstract

Remote sensing image scene classification has attracted great attention because of its wide applications. Although convolutional neural network (CNN)-based methods for scene classification have achieved excellent results, the large-scale variation of the features and objects in remote sensing images limits the further improvement of the classification performance. To address this issue, we present multiscale representation for scene classification, which is realized by a global-local two-stream architecture. This architecture has two branches of the global stream and local stream, which can individually extract the global features and local features from the whole image and the most important area. In order to locate the most important area in the whole image using only image-level labels, a weakly supervised key area detection strategy of structured key area localization (SKAL) is specially designed to connect the above two streams. To verify the effectiveness of the proposed SKAL-based two-stream architecture, we conduct comparative experiments based on three widely used CNN models, including AlexNet, GoogleNet, and ResNet18, on four public remote sensing image scene classification data sets, and achieve the state-of-the-art results on all the four data sets. Our codes are provided in https://github.com/hw2hwei/SKAL.

Full Text