Abstract

As one of the fundamental tasks in remote sensing (RS) image understanding, multi-label remote sensing image scene classification (MLRSSC) is attracting increasing research interest. Human beings can easily perform MLRSSC by examining the visual elements contained in the scene and the spatio-topological relationships of these visual elements. However, most of existing methods are limited by only perceiving visual elements but disregarding the spatio-topological relationships of visual elements. With this consideration, this paper proposes a novel deep learning-based MLRSSC framework by combining convolutional neural network (CNN) and graph neural network (GNN), which is termed the MLRSSC-CNN-GNN. Specifically, the CNN is employed to learn the perception ability of visual elements in the scene and generate the high-level appearance features. Based on the trained CNN, one scene graph for each scene is further constructed, where nodes of the graph are represented by superpixel regions of the scene. To fully mine the spatio-topological relationships of the scene graph, the multi-layer-integration graph attention network (GAT) model is proposed to address MLRSSC, where the GAT is one of the latest developments in GNN. Extensive experiments on two public MLRSSC datasets show that the proposed MLRSSC-CNN-GNN can obtain superior performance compared with the state-of-the-art methods.

Highlights

  • Single-label remote sensing (RS) image scene classification considers the image scene as the basic interpretation unit and aims to assign one semantic category to the RS image scene according to its visual and contextual content [1,2,3]

  • Motivated by the notion that the visual elements in the image can be perceived by the convolutional neural network (CNN) and the topological relationships among graph-structured data can be learned by the graph neural network (GNN), we propose a novel multi-label remote sensing image scene classification (MLRSSC) framework by combining the CNN and the GNN, which is termed the MLRSSC-CNN-GNN

  • We can observe that our proposed MLRSSC-CNN-GNN via the multi-layer-integration graph attention network (GAT) achieves the highest scores for Recall, F1-Score and F2-Score

Read more

Summary

Introduction

Single-label remote sensing (RS) image scene classification considers the image scene (i.e., one image block) as the basic interpretation unit and aims to assign one semantic category to the RS image scene according to its visual and contextual content [1,2,3]. Due to its extensive applications in object detection [4,5,6,7], image retrieval [8,9,10], etc., single-label RS image scene classification has attracted extensive attention. Single-label RS scene classification has reached saturation accuracy [15]. Compared with single-label RS image scene classification, multi-label remote sensing image scene classification (MLRSSC) is a more realistic task. MLRSSC aims to predict multiple semantic labels to describe an RS image scene. How to effectively extract discriminative semantic representations to distinguish multiple categories is still an open problem that deserves much more exploration

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call