Parallel-fusion LSTM with synchronous semantic and visual information for image captioning

Jing Zhang,Kangkang Li,Zhe Wang

doi:10.1016/j.jvcir.2021.103044

Abstract

For synchronously combining the dynamic semantic and visual information in the decoder part of image captioning, we propose a novel parallel-fusion LSTM (pLSTM) structure in this paper. Two parallel LSTMs with attributes and visual information of image are fused by the hidden states at every time step, which makes the attributes and visual information complementary or enhanced for generating more accurate captions. According to the different ways of integrating semantic information from attribute LSTM to visual LSTM, we propose two models pLSTM with attention (pLSTM-A) and pLSTM with guiding (pLSTM-G). pLSTM-A can automatically capture the crucial semantic and visual information to generate captions, and pLSTM-G directly adjusts the hidden state of visual LSTM by synchronous semantic information to the critical region. For verifying the effectiveness of our proposed pLSTM, we conduct a series of experiments on MSCOCO and Flickr30K datasets, and the experimental results outperform some state-of-the-art image captioning methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Parallel-fusion LSTM with synchronous semantic and visual information for image captioning

Abstract

Talk to us

Similar Papers

More From: Journal of Visual Communication and Image Representation

Lead the way for us

Journal: Journal of Visual Communication and Image Representation	Publication Date: Feb 1, 2021
Citations: 14

Similar Papers

Adaptive Semantic-Enhanced Transformer for Image Captioning.
Jing Zhang ... Zhe Wang
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35
Jing Zhang, et. al.Jing Zhang ... Zhe Wang
01 Feb 2024
IEEE Transactions on Neural Networks and Learning Systems | VOL. 35

Image Captioning Based on Semantic Scenes.
Fengzhi Zhao ... Yi Lv
Entropy (Basel, Switzerland) | VOL. 26
Fengzhi Zhao, et. al.Fengzhi Zhao ... Yi Lv
18 Oct 2024
Entropy (Basel, Switzerland) | VOL. 26

Ontology-based Concept Similarity Integrating Image Semantic and Visual Information
Mengyun Wang ... Hailiang Yu
-
Mengyun Wang, et. al.Mengyun Wang ... Hailiang Yu
29 Sep 2014
29 Sep 2014

Parallel Semantic Fusion Image Caption Generation Analysis Theory
Huawei Zhang ... Chengbo Ma
-
Huawei Zhang, et. al.Huawei Zhang ... Chengbo Ma
17 Jun 2022
17 Jun 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Parallel-fusion LSTM with synchronous semantic and visual information for image captioning

Abstract

Talk to us

Similar Papers

More From: Journal of Visual Communication and Image Representation