An Image Captioning Algorithm Based on Combination Attention Mechanism

Jinlong Liu,Haiyan Jin,Kangda Cheng,Zhilu Wu

doi:10.3390/electronics11091397

Abstract

With the maturity of computer vision and natural language processing technology, we are becoming more ambitious in image captioning. In particular, we are more ambitious in generating longer, richer, and more accurate sentences as image descriptions. Most existing image caption models use an encoder—decoder structure, and most of the best-performing models incorporate attention mechanisms in the encoder—decoder structure. However, existing image captioning methods focus only on visual attention mechanism and not on keywords attention mechanism, thus leading to model-generated sentences that are not rich and accurate enough, and errors in visual feature extraction can directly lead to generated caption sentences that are incorrect. To fill this gap, we propose a combination attention module. This module comprises a visual attention module and a keyword attention module. The visual attention module helps in performing fast extractions of key local features, and the keyword attention module focuses on keywords that may appear in generated sentences. The results generated by the two modules can be corrected for each other. We embed the combination attention module into the framework of the Transformer, thus constructing a new image caption model CAT (Combination Attention Transformer) to generate more accurate and rich image caption sentences. Extensive experiments on the MSCOCO dataset demonstrate the effectiveness and superiority of our method over many state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Apr 27, 2022
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Image Captioning Algorithm Based on Combination Attention Mechanism

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Chinese Image Caption Generation via Visual Attention and Topic Modeling.
Maofu Liu ... Lingjun Li
IEEE Transactions on Cybernetics | VOL. 52
Maofu Liu, et. al.Maofu Liu ... Lingjun Li
22 Jun 2020
IEEE Transactions on Cybernetics | VOL. 52

Global Visual Feature and Linguistic State Guided Attention for Remote Sensing Image Captioning
Zhengyuan Zhang ... Xian Sun
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60
Zhengyuan Zhang, et. al.Zhengyuan Zhang ... Xian Sun
01 Jan 2021
IEEE Transactions on Geoscience and Remote Sensing | VOL. 60

Image caption based on Visual Attention Mechanism
Jinfei Zhou ... Yaping Zhu
-
Jinfei Zhou, et. al.Jinfei Zhou ... Yaping Zhu
25 Feb 2019
25 Feb 2019

Adaptive Attention-based High-level Semantic Introduction for Image Caption
Xiaoxiao Liu ... Qingyang Xu
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 16
Xiaoxiao Liu, et. al.Xiaoxiao Liu ... Qingyang Xu
30 Nov 2020
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Image Captioning Algorithm Based on Combination Attention Mechanism

Abstract

Talk to us

Similar Papers

More From: Electronics