Abstract

Remote sensing image scene classification (RSISC) is an active task in the remote sensing community and has attracted great attention due to its wide applications. Recently, the deep convolutional neural networks (CNNs)-based methods have witnessed a remarkable breakthrough in performance of remote sensing image scene classification. However, the problem that the feature representation is not discriminative enough still exists, which is mainly caused by the characteristic of inter-class similarity and intra-class diversity. In this paper, we propose an efficient end-to-end local-global-fusion feature extraction (LGFFE) network for a more discriminative feature representation. Specifically, global and local features are extracted from channel and spatial dimensions respectively, based on a high-level feature map from deep CNNs. For the local features, a novel recurrent neural network (RNN)-based attention module is first proposed to capture the spatial layout information and context information across different regions. Gated recurrent units (GRUs) is then exploited to generate the important weight of each region by taking a sequence of features from image patches as input. A reweighed regional feature representation can be obtained by focusing on the key region. Then, the final feature representation can be acquired by fusing the local and global features. The whole process of feature extraction and feature fusion can be trained in an end-to-end manner. Finally, extensive experiments have been conducted on four public and widely used datasets and experimental results show that our method LGFFE outperforms baseline methods and achieves state-of-the-art results.

Highlights

  • Rapid technological advancement and development of instruments for earth observation are responsible for the generation of numerous high-resolution remote sensing images

  • Remote sensing image scene classification (RSISC) focuses on categorizing acquired images into the pre-defined classes according to the semantic content of scene images, which has been extensively explored due to the important role it plays in many applications

  • We proposed a novel end-to-end local-global-fusion feature extraction (LGFFE) network for RSISC

Read more

Summary

Introduction

Rapid technological advancement and development of instruments for earth observation are responsible for the generation of numerous high-resolution remote sensing images. There has been a surge in demand for effective understanding of the semantic content and accurate identification and classification of land use and land cover scenes [1,2,3]. Remote sensing image scene classification (RSISC) focuses on categorizing acquired images into the pre-defined classes according to the semantic content of scene images, which has been extensively explored due to the important role it plays in many applications. As an active topic of the remote sensing area, RSISC has received growing attention and numerous methods have been proposed to solve this task. From the proposal of AlexNet [4] for classifying the dataset ImageNet containing 1000 categories and more than one million natural images, deep learning-based methods have been proven to have a better performance and show superior potential compared to handcrafted features-based ones [5,6].

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call