Image Caption Generation Using Contextual Information Fusion With Bi-LSTM-s

Huawei Zhang,Chengbo Ma,Zhanjun Jiang,Jing Lian

doi:10.1109/access.2022.3232508

Huawei Zhang, Chengbo Ma + Show 2 more

Open Access

https://doi.org/10.1109/access.2022.3232508

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2023
Citations: 16	License type: CC BY 4.0

Affiliation: Lanzhou Jiaotong University

Abstract

The image caption generation algorithm necessitates the expression of image content using accurate natural language. Given the existing encoder-decoder algorithm structure, the decoder solely generates words one by one in a front-to-back order and is unable to analyze integral contextual information. This paper employs a Bi-LSTM (Bi-directional Long Short-Term Memory) structure, which not only draws on past information but also captures subsequent information, resulting in the prediction of image content subject to the context clues. The visual information is respectively fed into the F-LSTM decoder (forward LSTM decoder) and B-LSTM decoder (backward LSTM decoder) to extract semantic information, along with complementing semantic output. Specifically, the subsidiary attention mechanism S-Att acts between F-LSTM and B-LSTM, while the semantic information of B-LSTM and F-LSTM is extracted using the attention mechanism. Meanwhile, the semantic interaction is extracted pursuant to the similarity while aligning the hidden states, resulting in the output of the fused semantic information. We adopt a Bi-LSTM-s model capable of extracting contextual information and realizing finer-grained image captioning effectively. In the end, our model improved by 9.7% on the basis of the original LSTM. In addition, our model effectively solves the problem of inconsistent semantic information in the forward and backward direction of the simultaneous order, and gets a score of 37.5 on BLEU-4. The superiority of this approach is experimentally demonstrated on the MSCOCO dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Image Caption Generation Using Contextual Information Fusion With Bi-LSTM-s

Abstract

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Bi-LS-AttM: A Bidirectional LSTM and Attention Mechanism Model for Improving Image Captioning
Tian Xie ... Jinbao Zhang
Applied Sciences | VOL. 13
Tian Xie, et. al.Tian Xie ... Jinbao Zhang
06 Jul 2023
Applied Sciences | VOL. 13

Adaptive Attention-based High-level Semantic Introduction for Image Caption
Xiaoxiao Liu ... Qingyang Xu
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 16
Xiaoxiao Liu, et. al.Xiaoxiao Liu ... Qingyang Xu
30 Nov 2020
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 16

Parallel-fusion LSTM with synchronous semantic and visual information for image captioning
Jing Zhang ... Zhe Wang
Journal of Visual Communication and Image Representation | VOL. 75
Jing Zhang, et. al.Jing Zhang ... Zhe Wang
01 Feb 2021
Journal of Visual Communication and Image Representation | VOL. 75

Synthesis of Vision and Language: Multifaceted Image Captioning Application
Arpit Gupta ... Ishita Kohli
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07
Arpit Gupta, et. al.Arpit Gupta ... Ishita Kohli
23 Dec 2023
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Image Caption Generation Using Contextual Information Fusion With Bi-LSTM-s

Abstract

Talk to us

Similar Papers

More From: IEEE Access