Cross-Modal Feature Representation Learning and Label Graph Mining in a Residual Multi-Attentional CNN-LSTM Network for Multi-Label Aerial Scene Classification

Peng Li,Peng Chen,Dezheng Zhang

doi:10.3390/rs14102424

Peng Li, Peng Chen + Show 1 more

Open Access

https://doi.org/10.3390/rs14102424

Copy DOI

Abstract

The results of aerial scene classification can provide valuable information for urban planning and land monitoring. In this specific field, there are always a number of object-level semantic classes in big remote-sensing pictures. Complex label-space makes it hard to detect all the targets and perceive corresponding semantics in the typical scene, thereby weakening the sensing ability. Even worse, the preparation of a labeled dataset for the training of deep networks is more difficult due to multiple labels. In order to mine object-level visual features and make good use of label dependency, we propose a novel framework in this article, namely a Cross-Modal Representation Learning and Label Graph Mining-based Residual Multi-Attentional CNN-LSTM framework (CM-GM framework). In this framework, a residual multi-attentional convolutional neural network is developed to extract object-level image features. Moreover, semantic labels are embedded by language model and then form a label graph which can be further mapped by advanced graph convolutional networks (GCN). With these cross-modal feature representations (image, graph and text), object-level visual features will be enhanced and aligned to GCN-based label embeddings. After that, aligned visual signals are fed into a bi-LSTM subnetwork according to the built label graph. The CM-GM framework is able to map both visual features and graph-based label representations into a correlated space appropriately, using label dependency efficiently, thus improving the LSTM predictor’s ability. Experimental results show that the proposed CM-GM framework is able to achieve higher accuracy on many multi-label benchmark datasets in remote sensing field.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Remote Sensing	Publication Date: May 18, 2022
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Cross-Modal Feature Representation Learning and Label Graph Mining in a Residual Multi-Attentional CNN-LSTM Network for Multi-Label Aerial Scene Classification

Abstract

Talk to us

Similar Papers

More From: Remote Sensing

Lead the way for us

Similar Papers

Cross-modal representation learning and generation
Huafeng Liu ... Zechao Li
Journal of Image and Graphics | VOL. 28
Huafeng Liu, et. al.Huafeng Liu ... Zechao Li
01 Jan 2023
Journal of Image and Graphics | VOL. 28

Transformer-Exclusive Cross-Modal Representation for Vision and Language
...
-
, et. al. ...
22 Jul 2021
22 Jul 2021

Transformer-Exclusive Cross-Modal Representation for Vision and Language
...
-
, et. al. ...
01 Aug 2021
01 Aug 2021

Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking
Vedran Vukotić ... Christian Raymond
-
Vedran Vukotić, et. al.Vedran Vukotić ... Christian Raymond
16 Oct 2016
16 Oct 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-Modal Feature Representation Learning and Label Graph Mining in a Residual Multi-Attentional CNN-LSTM Network for Multi-Label Aerial Scene Classification

Abstract

Talk to us

Similar Papers

More From: Remote Sensing