DAA: Dual LSTMs with adaptive attention for image captioning

Fen Xiao,Xue Gong,Yiming Zhang,Yanqing Shen,Jun Li,Xieping Gao

doi:10.1016/j.neucom.2019.06.085

Abstract

Image captioning enables people to better understand images through fine-grained analysis. Recently the encoder-decoder architecture with attention mechanism has achieved great achievements in image captioning and visual question answering. In this paper, we propose a new captioning algorithm that integrates two separate LSTM (Long-short Term Memory) networks through an adaptive semantic attention model. Within our approach, the first LSTM network is followed by an attention model, which serves as a visual sentinel can flexibly make a trade off between the visual semantic region and textual content. Another LSTM is used as a language model, which combines the hidden state representation of the first LSTM and attention context vector, then outputs the word sequence. The proposed model has been extensively evaluated on two large-scale datasets: MSCOCO and Flickr30k. Experimental results show that the proposed method pays more attention to visual salient regions and achieves significant performance of prior state-of-the-art approaches on multiple evaluation metrics.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DAA: Dual LSTMs with adaptive attention for image captioning

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Jul 22, 2019
Citations: 28

Similar Papers

Evolution of Long Short-Term Memory (LSTM) in Air Pollution Forecasting
Satheesh Abimannan ... Yue-Shan Chang
-
Satheesh Abimannan, et. al.Satheesh Abimannan ... Yue-Shan Chang
23 May 2022
23 May 2022

Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network
Alex Sherstinsky
Physica D: Nonlinear Phenomena | VOL. 404
Alex SherstinskyAlex Sherstinsky
21 Jan 2020
Physica D: Nonlinear Phenomena | VOL. 404

Part-of-Speech Tagging Using Long Short Term Memory (LSTM): Amazigh Text Written in Tifinaghe Characters
Otman Maarouf ... Rachid El Ayachi
-
Otman Maarouf, et. al.Otman Maarouf ... Rachid El Ayachi
01 Jan 2020
01 Jan 2020

Research and Application of Deformation Prediction Model for Deep Foundation Pit Based on LSTM
Hailin Li ... Xue Du
Wireless Communications and Mobile Computing | VOL. 2022
Hailin Li, et. al.Hailin Li ... Xue Du
06 Jul 2022
Wireless Communications and Mobile Computing | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DAA: Dual LSTMs with adaptive attention for image captioning

Abstract

Talk to us

Similar Papers

More From: Neurocomputing