Image caption generation using a dual attention mechanism

Roshni Padate,Amit Jain,Mukesh Kalla,Arvind Sharma

doi:10.1016/j.engappai.2023.106112

Abstract

In order to create a statement that accurately captures the main idea of an ambiguous visual, which is said to be a significant and demanding task? Conventional image captioning schemes are categorized into 2 classes: retrieval-oriented schemes and generation-oriented schemes. The image caption generating system should provide precise, fluid, natural, and informative phrases as well as accurately identify the content of the image, such as scene, object, relationship, and properties of the object in the image. However, it can be challenging to accurately express the image’s content when creating image captions because not all visual information can be used. In this article, a new image captioning model is introduced that includes 3 main phases like (1) Extraction of Inception V3 features (2) Dual (Visual and Textual) attention generation and (3) generation of image caption. Convolutional Neural Network (CNN) is used to generate visual attention after first deriving initial V3 features. The input texts for the associated images, on the other hand, are analyzed and given to LSTM for the creation of textual attention. To create image captions, Bidirectional LSTM (BI-LSTM) is used to combine textual and visual attention. The Self Improved Electric Fish Optimization (SI-EFO) algorithm is used in particular to optimize the weights of the BI-LSTM. In the end, several measures confirm that the implemented system has improved. The adopted model is 35.21%, 33.76%, 39.52%, 29.69%, 30.12%, 21.49%, and 31.71% better than GAN-RL, LSTM, GRU, EC + GOA, EC + CMBO, EC + DA, EC + EFO models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Image caption generation using a dual attention mechanism

Abstract

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence

Lead the way for us

Journal: Engineering Applications of Artificial Intelligence	Publication Date: Mar 29, 2023
Citations: 9

Similar Papers

Automated Image Captioning with Multi-layer Gated Recurrent Unit
Ozge Taylan Moral ... Volkan Kilic
-
Ozge Taylan Moral, et. al.Ozge Taylan Moral ... Volkan Kilic
29 Aug 2022
29 Aug 2022

Chinese Image Caption Generation via Visual Attention and Topic Modeling.
Maofu Liu ... Lingjun Li
IEEE Transactions on Cybernetics | VOL. 52
Maofu Liu, et. al.Maofu Liu ... Lingjun Li
22 Jun 2020
IEEE Transactions on Cybernetics | VOL. 52

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning
Long Chen ... Hanwang Zhang
-
Long Chen, et. al.Long Chen ... Hanwang Zhang
01 Jul 2017
01 Jul 2017

Interpretable machine learning for retinopathy of prematurity
Muhamed Veysi Yildiz
-
Muhamed Veysi YildizMuhamed Veysi Yildiz
24 Aug 2022
24 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Image caption generation using a dual attention mechanism

Abstract

Talk to us

Similar Papers

More From: Engineering Applications of Artificial Intelligence