Relational-Convergent Transformer for image captioning

Lizhi Chen,You Yang,Juntao Hu,Longyue Pan,Hao Zhai

doi:10.1016/j.displa.2023.102377

Abstract

Image captioning describes the visual content of a given image by using natural language sentences, and plays a key role in the fusion and utilization of the image features. However, in the existing image captioning models, the decoder sometimes fails to efficiently capture the relationships between image features because of their lack of sequential dependencies. In this paper, we propose a Relational-Convergent Transformer (RCT) network to obtain complex intramodality representations in image captioning. In RCT, a Relational Fusion Module (RFM) is designed for capturing the local and global information of an image by a recursive fusion. Then, a Relational-Convergent Attention (RCA) is proposed, which is composed of a self-attention and a hierarchical fusion module for aggregating global relational information to extract a more comprehensive intramodal contextual representation. To validate the effectiveness of the proposed model, extensive experiments are conducted on the MSCOCO dataset. The experimental results show that the proposed method outperforms some of the state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Relational-Convergent Transformer for image captioning

Abstract

Talk to us

Similar Papers

More From: Displays

Lead the way for us

Journal: Displays	Publication Date: Jan 25, 2023
Citations: 10

Similar Papers

A review on image captioning system from artificial intelligence, machine learning and deep learning techniques
B S Revathi ... Kowshalya A Meena
i-manager’s Journal on Image Processing | VOL. 9
B S Revathi, et. al.B S Revathi ... Kowshalya A Meena
01 Jan 2021
i-manager’s Journal on Image Processing | VOL. 9

Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network
Jiayi Ji ... Yunpeng Luo
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Jiayi Ji, et. al.Jiayi Ji ... Yunpeng Luo
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

GuessWhich? Visual dialog with attentive memory network
Lei Zhao ... Lianli Gao
Pattern Recognition | VOL. 114
Lei Zhao, et. al.Lei Zhao ... Lianli Gao
14 Jan 2021
Pattern Recognition | VOL. 114

Cascaded feature fusion with multi-level self-attention mechanism for object detection
Chuanxu Wang ... Huiru Wang
Pattern Recognition | VOL. 138
Chuanxu Wang, et. al.Chuanxu Wang ... Huiru Wang
05 Feb 2023
Pattern Recognition | VOL. 138

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Relational-Convergent Transformer for image captioning

Abstract

Talk to us

Similar Papers

More From: Displays