Abstract
As one of the fundamental tasks in remote sensing (RS) image understanding, multi-label remote sensing image scene classification (MLRSSC) is attracting increasing research interest. Human beings can easily perform MLRSSC by examining the visual elements contained in the scene and the spatio-topological relationships of these visual elements. However, most of existing methods are limited by only perceiving visual elements but disregarding the spatio-topological relationships of visual elements. With this consideration, this paper proposes a novel deep learning-based MLRSSC framework by combining convolutional neural network (CNN) and graph neural network (GNN), which is termed the MLRSSC-CNN-GNN. Specifically, the CNN is employed to learn the perception ability of visual elements in the scene and generate the high-level appearance features. Based on the trained CNN, one scene graph for each scene is further constructed, where nodes of the graph are represented by superpixel regions of the scene. To fully mine the spatio-topological relationships of the scene graph, the multi-layer-integration graph attention network (GAT) model is proposed to address MLRSSC, where the GAT is one of the latest developments in GNN. Extensive experiments on two public MLRSSC datasets show that the proposed MLRSSC-CNN-GNN can obtain superior performance compared with the state-of-the-art methods.
Highlights
Single-label remote sensing (RS) image scene classification considers the image scene as the basic interpretation unit and aims to assign one semantic category to the RS image scene according to its visual and contextual content [1,2,3]
Motivated by the notion that the visual elements in the image can be perceived by the convolutional neural network (CNN) and the topological relationships among graph-structured data can be learned by the graph neural network (GNN), we propose a novel multi-label remote sensing image scene classification (MLRSSC) framework by combining the CNN and the GNN, which is termed the MLRSSC-CNN-GNN
We can observe that our proposed MLRSSC-CNN-GNN via the multi-layer-integration graph attention network (GAT) achieves the highest scores for Recall, F1-Score and F2-Score
Summary
Single-label remote sensing (RS) image scene classification considers the image scene (i.e., one image block) as the basic interpretation unit and aims to assign one semantic category to the RS image scene according to its visual and contextual content [1,2,3]. Due to its extensive applications in object detection [4,5,6,7], image retrieval [8,9,10], etc., single-label RS image scene classification has attracted extensive attention. Single-label RS scene classification has reached saturation accuracy [15]. Compared with single-label RS image scene classification, multi-label remote sensing image scene classification (MLRSSC) is a more realistic task. MLRSSC aims to predict multiple semantic labels to describe an RS image scene. How to effectively extract discriminative semantic representations to distinguish multiple categories is still an open problem that deserves much more exploration
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.