Abstract

Deep convolutional neural networks have become an indispensable method in remote sensing image scene classification because of their powerful feature extraction capabilities. However, the ability of the models to extract multiscale features and global features on surface objects of complex scenes is currently insufficient. We propose a framework based on global context spatial attention (GCSA) and densely connected convolutional networks to extract multiscale global scene features, called GCSANet. The mixup operation is used to enhance the spatial mixed data of remote sensing images, and the discrete sample space is rendered continuous to improve the smoothness in the neighborhood of the data space. The characteristics of multiscale surface objects are extracted, and their internal dense connection is strengthened by the densely connected backbone network. GCSA is introduced into the densely connected backbone network to encode the context information of the remote sensing scene image into the local features. Experiments were performed on four remote sensing scene datasets to evaluate the performance of GCSANet. The GCSANet achieved the highest classification precision on AID and NWPU datasets and the second-best performance on the UC Merced dataset, indicating the GCSANet can effectively extract the global features of remote sensing images. In addition, the GCSANet presents the highest classification accuracy on the constructed mountain image scene dataset. These results reveal that the GCSANet can effectively extract multiscale global scene features on complex remote sensing scenes. The source codes of this method can be foundin <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/ShubingOuyangcug/GCSANet</uri> .

Highlights

  • R EMOTE sensing image scene classification marks remote sensing scene images with specific high-level semantic categories, which can be effectively analyzed to obtain highlevel semantic information [1]

  • The GCSANet network extracts the features of multiscale surface objects, establishes internal feature connections, and introduces an attention mechanism in the spatial domain, which aims to extract the global features of remote sensing scene images

  • To evaluate the efficiency of the GCSANet, 15 of the latest methods were used for comparison on the UC Merced (UCM) dataset

Read more

Summary

Introduction

R EMOTE sensing image scene classification marks remote sensing scene images with specific high-level semantic categories, which can be effectively analyzed to obtain highlevel semantic information [1]. In recent years, it has become a prominent research area in the field of high-resolution remote. Researchers have long been devoted to extracting various effective remote sensing image feature representations to improve the accuracy [2]. These features can be divided into three major categories: manual features, middle-level features, and deep-level features

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call