Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning

Hitesh Kandala,Xiao Xiang Zhu,Sudipan Saha,Biplab Banerjee

doi:10.1109/lgrs.2022.3198234

Abstract

High-resolution remote sensing images are now available with the progress of remote sensing technology. With respect to popular remote sensing tasks like scene classification, image captioning provides comprehensible information about such images by summarizing the image content in human-readable text. Most existing remote sensing image captioning methods are based on deep learning-based encoder-decoder frameworks, using Convolutional Neural Network or Recurrent Neural Network as the backbone of such frameworks. Such frameworks show a limited capability to analyze sequential data and cope with the lack of captioned remote sensing training images. Recently introduced Transformer architecture exploits self-attention to obtain superior performance for sequence-analysis tasks. Inspired by this, in this work, we employ a Transformer as an encoder-decoder for remote sensing image captioning. Moreover, to deal with the limited training data, an auxiliary decoder is used that further helps the encoder in the training process. The auxiliary decoder is trained for multi-label scene classification due to its conceptual similarity to image captioning and capability of highlighting semantic classes. To the best of our knowledge, this is the first work exploiting multi-label classification to improve remote sensing image captioning. Experimental results on the UC Merced caption data set show the efficacy of the proposed method. The implementation details can be found in https://gitlab.lrz.de/ai4eo/captioningMultilabel.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society	Publication Date: Jan 1, 2022
Citations: 17	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society

Lead the way for us

Similar Papers

Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning
Zhenghang Yuan ... Qi Wang
IEEE access : practical innovations, open solutions | VOL. 8
Zhenghang Yuan, et. al.Zhenghang Yuan ... Qi Wang
01 Jan 2020
IEEE access : practical innovations, open solutions | VOL. 8

Generating the captions for remote sensing images: A spatial-channel attention based memory-guided transformer approach
Gaurav O Gajbhiye ... Abhijeet V Nandedkar
Engineering Applications of Artificial Intelligence | VOL. 114
Gaurav O Gajbhiye, et. al.Gaurav O Gajbhiye ... Abhijeet V Nandedkar
24 Jun 2022
Engineering Applications of Artificial Intelligence | VOL. 114

Improving Remote Sensing Image Captioning by Combining Grid Features and Transformer
Shuo Zhuang ... Feng Gao
IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society | VOL. 19
Shuo Zhuang, et. al.Shuo Zhuang ... Feng Gao
01 Jan 2021
IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society | VOL. 19

An Image Sentence Generation Based on Deep Neural Network Using RCNN-LSTM Model
S Sai Satyanarayana Reddy ... Ashwani Kumar
-
S Sai Satyanarayana Reddy, et. al.S Sai Satyanarayana Reddy ... Ashwani Kumar
26 Nov 2021
26 Nov 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society