RVAIC: Refined visual attention for improved image captioning

Majjed Al-Qatf,Mohammed Alhabib,Saeed Alsamhi,Xingfu Wang,Ammar Hawbani,Edward Curry,Amr Abdusallam

doi:10.3233/jifs-233004

Abstract

Visual attention has emerged as a prominent approach for improving the effectiveness of image captioning, as it enables the decoder network to focus selectively on the most salient regions in the image content, thereby facilitating the generation of precise and informative captions. Although visual attention achieves the improvement, the small numerical values of its input have a negative impact on its softmax, decreasing its effectiveness. To address this limitation, we propose a refined visual attention (RVA) framework that internally reweights visual attention by leveraging the language context of previously generated words. We first feed the language context into a fully connected layer to obtain appropriate dimensions for the visual features. Then, we use a sigmoid function to obtain a probability distribution to reweight the softmax’s input by applying the multiplication process. Experiments conducted on the MS COCO dataset demonstrate that RVA outperforms traditional visual attention and other existing image captioning methods, highlighting its effectiveness in enhancing the accuracy and informativeness of image captions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

RVAIC: Refined visual attention for improved image captioning

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent & Fuzzy Systems

Lead the way for us

Journal: Journal of Intelligent & Fuzzy Systems	Publication Date: Feb 14, 2024
Citations: 3

Similar Papers

Image caption generation using a dual attention mechanism
Roshni Padate ... Arvind Sharma
Engineering Applications of Artificial Intelligence | VOL. 123
Roshni Padate, et. al.Roshni Padate ... Arvind Sharma
29 Mar 2023
Engineering Applications of Artificial Intelligence | VOL. 123

Synthesis of Vision and Language: Multifaceted Image Captioning Application
Arpit Gupta ... Himanshu Goyal
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07
Arpit Gupta, et. al.Arpit Gupta ... Himanshu Goyal
23 Dec 2023
INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT | VOL. 07

An image caption method based on object detection
Danyang Cao ... Menggui Zhu
Multimedia Tools and Applications | VOL. 78
Danyang Cao, et. al.Danyang Cao ... Menggui Zhu
03 Sep 2019
Multimedia Tools and Applications | VOL. 78

Improving Image Captioning with Better Use of Caption
Zhan Shi ... Xu Zhou
-
Zhan Shi, et. al.Zhan Shi ... Xu Zhou
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

RVAIC: Refined visual attention for improved image captioning

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent &amp; Fuzzy Systems

More From: Journal of Intelligent & Fuzzy Systems