TypeFormer: Multiscale Transformer With Type Controller for Remote Sensing Image Caption

Zihang Chen,Junjue Wang,Ailong Ma,Yanfei Zhong

doi:10.1109/lgrs.2022.3192062

Abstract

Image captioning in remote sensing can help us understandthe inner attributes of the objects and the outer relations between different objects. However, the existing image captioning algorithms lack the ability of global representation, and cannot obtain object relations over long distances. In addition, these algorithmics generate captions randomly without consideration of the specific demands. To this end, we propose a pure transformer architecture with caption type controller for remote sensing image captioning. Specifically, a multi-scale vision transformer is adopted for the image representation, where the global and detailed content can be captured with multi-head self-attention layers. A transformer decoder is then introduced to successively translate the image features into comprehensive sentences. The optional block called the caption type controller is designed to consider the types of captions through caption type matrix sets according to the demands, embedding the learnable sentence feature with the required type. The comparison and ablation experiments conducted on the Remote Sensing Image Captioning Dataset (RSICD) dataset demonstrate that the proposed framework outperforms the current state-of-the-art image captioning methods. The experiments conducted on the FloodNet caption dataset further illustrate that the proposed methods can effectively generate specific types of captions.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

TypeFormer: Multiscale Transformer With Type Controller for Remote Sensing Image Caption

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society

Lead the way for us

Journal: IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society	Publication Date: Jan 1, 2022
Citations: 12

Similar Papers

Remote sensing image caption generation via transformer and reinforcement learning
Xiangqing Shen ... Bing Liu
Multimedia Tools and Applications | VOL. 79
Xiangqing Shen, et. al.Xiangqing Shen ... Bing Liu
17 Jul 2020
Multimedia Tools and Applications | VOL. 79

Recurrent Attention and Semantic Gate for Remote Sensing Image Captioning
Yunpeng Li ... Licheng Jiao
IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society | VOL. 60
Yunpeng Li, et. al.Yunpeng Li ... Licheng Jiao
01 Jan 2021
IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society | VOL. 60

Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning
Zhenghang Yuan ... Qi Wang
IEEE access : practical innovations, open solutions | VOL. 8
Zhenghang Yuan, et. al.Zhenghang Yuan ... Qi Wang
01 Jan 2020
IEEE access : practical innovations, open solutions | VOL. 8

Remote sensing image captioning via Variational Autoencoder and Reinforcement Learning
Xiangqing Shen ... Mingming Liu
Knowledge Based Systems | VOL. 203
Xiangqing Shen, et. al.Xiangqing Shen ... Mingming Liu
23 Apr 2020
Knowledge Based Systems | VOL. 203

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

TypeFormer: Multiscale Transformer With Type Controller for Remote Sensing Image Caption

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on geoscience and remote sensing : a publication of the IEEE Geoscience and Remote Sensing Society