Bi-LS-AttM: A Bidirectional LSTM and Attention Mechanism Model for Improving Image Captioning

Tian Xie,Xusen Wan,Jinbao Zhang,Weiping Ding,Jiehua Wang

doi:10.3390/app13137916

Abstract

The discipline of automatic image captioning represents an integration of two pivotal branches of artificial intelligence, namely computer vision (CV) and natural language processing (NLP). The principal functionality of this technology lies in transmuting the extracted visual features into semantic information of a higher order. The bidirectional long short-term memory (Bi-LSTM) has garnered wide acceptance in executing image captioning tasks. Of late, scholarly attention has been focused on modifying suitable models for innovative and precise subtitle captions, although tuning the parameters of the model does not invariably yield optimal outcomes. Given this, the current research proposes a model that effectively employs the bidirectional LSTM and attention mechanism (Bi-LS-AttM) for image captioning endeavors. This model exploits the contextual comprehension from both anterior and posterior aspects of the input data, synergistically with the attention mechanism, thereby augmenting the precision of visual language interpretation. The distinctiveness of this research is embodied in its incorporation of Bi-LSTM and the attention mechanism to engender sentences that are both structurally innovative and accurately reflective of the image content. To enhance temporal efficiency and accuracy, this study substitutes convolutional neural networks (CNNs) with fast region-based convolutional networks (Fast RCNNs). Additionally, it refines the process of generation and evaluation of common space, thus fostering improved efficiency. Our model was tested for its performance on Flickr30k and MSCOCO datasets (80 object categories). Comparative analyses of performance metrics reveal that our model, leveraging the Bi-LS-AttM, surpasses unidirectional and Bi-LSTM models. When applied to caption generation and image-sentence retrieval tasks, our model manifests time economies of approximately 36.5% and 26.3% vis-a-vis the Bi-LSTM model and the deep Bi-LSTM model, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Jul 6, 2023
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Bi-LS-AttM: A Bidirectional LSTM and Attention Mechanism Model for Improving Image Captioning

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Oppositional Harris Hawks Optimization with Deep Learning-Based Image Captioning
V R Kavitha ... Yunyoung Nam
Computer Systems Science and Engineering | VOL. 44
V R Kavitha, et. al.V R Kavitha ... Yunyoung Nam
01 Jan 2023
Computer Systems Science and Engineering | VOL. 44

Gujarati Task Oriented Dialogue Slot Tagging Using Deep Neural Network Models
Rachana Parikh ... Hiren Joshi
-
Rachana Parikh, et. al.Rachana Parikh ... Hiren Joshi
01 Jan 2020
01 Jan 2020

Detection of communicable and non-communicable diseases using hyperparameter optimization with Bi-LSTM model in pathology images
Shiva Sumanth Reddy ... C Nandini
International Journal of Intelligent Computing and Cybernetics | VOL. 16
Shiva Sumanth Reddy, et. al.Shiva Sumanth Reddy ... C Nandini
22 Mar 2022
International Journal of Intelligent Computing and Cybernetics | VOL. 16

An attention mechanism and multi-granularity-based Bi-LSTM model for Chinese Q&A system
Xiao-Mei Yu ... Wen-Zhi Feng
Soft Computing | VOL. 24
Xiao-Mei Yu, et. al.Xiao-Mei Yu ... Wen-Zhi Feng
24 Sep 2019
Soft Computing | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bi-LS-AttM: A Bidirectional LSTM and Attention Mechanism Model for Improving Image Captioning

Abstract

Talk to us

Similar Papers

More From: Applied Sciences