Global-Attention-Based Neural Networks for Vision Language Intelligence

Pei Liu,Yingjie Zhou,Dezhong Peng,Dapeng Wu

doi:10.1109/jas.2020.1003402

Abstract

In this paper, we develop a novel global-attention-based neural network (GANN) for vision language intelligence, specifically, image captioning (language description of a given image). As many previous works, the encoder-decoder framework is adopted in our proposed model, in which the encoder is responsible for encoding the region proposal features and extracting global caption feature based on a specially designed module of predicting the caption objects, and the decoder generates captions by taking the obtained global caption feature along with the encoded visual features as inputs for each attention head of the decoder layer. The global caption feature is introduced for the purpose of exploring the latent contributions of region proposals for image captioning, and further helping the decoder better focus on the most relevant proposals so as to extract more accurate visual feature in each time step of caption generation. Our GANN is implemented by incorporating the global caption feature into the attention weight calculation phase in the word predication process in each head of the decoder layer. In our experiments, we qualitatively analyzed the proposed model, and quantitatively evaluated several state-of-the-art schemes with GANN on the MS-COCO dataset. Experimental results demonstrate the effectiveness of the proposed global attention mechanism for image captioning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Global-Attention-Based Neural Networks for Vision Language Intelligence

Abstract

Talk to us

Similar Papers

More From: IEEE/CAA Journal of Automatica Sinica

Lead the way for us

Journal: IEEE/CAA Journal of Automatica Sinica	Publication Date: Jul 1, 2021
Citations: 18

Similar Papers

Global Visual Feature and Linguistic State Guided Attention for Remote Sensing Image Captioning
Zhengyuan Zhang ... Wenkai Zhang
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60
Zhengyuan Zhang, et. al.Zhengyuan Zhang ... Wenkai Zhang
01 Jan 2021
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60

End-to-End Transformer Based Model for Image Captioning
Yiyu Wang ... Jungang Xu
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36
Yiyu Wang, et. al.Yiyu Wang ... Jungang Xu
28 Jun 2022
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36

Improving Image Captioning with Better Use of Caption
Zhan Shi ... Xu Zhou
-
Zhan Shi, et. al.Zhan Shi ... Xu Zhou
01 Jan 2020
01 Jan 2020

Automated Image Captioning with Multi-layer Gated Recurrent Unit
Ozge Taylan Moral ... Volkan Kilic
-
Ozge Taylan Moral, et. al.Ozge Taylan Moral ... Volkan Kilic
29 Aug 2022
29 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Global-Attention-Based Neural Networks for Vision Language Intelligence

Abstract

Talk to us

Similar Papers

More From: IEEE/CAA Journal of Automatica Sinica