Full-Memory Transformer for Image Captioning

Tongwei Lu,Jiarong Wang,Fen Min

doi:10.3390/sym15010190

Abstract

The Transformer-based approach represents the state-of-the-art in image captioning. However, existing studies have shown Transformer has a problem that irrelevant tokens with overlapping neighbors incorrectly attend to each other with relatively large attention scores. We believe that this limitation is due to the incompleteness of the Self-Attention Network (SAN) and Feed-Forward Network (FFN). To solve this problem, we present the Full-Memory Transformer method for image captioning. The method improves the performance of both image encoding and language decoding. In the image encoding step, we propose the Full-LN symmetric structure, which enables stable training and better model generalization performance by symmetrically embedding Layer Normalization on both sides of the SAN and FFN. In the language decoding step, we propose the Memory Attention Network (MAN), which extends the traditional attention mechanism to determine the correlation between attention results and input sequences, guiding the model to focus on the words that need to be attended to. Our method is evaluated on the MS COCO dataset and achieves good performance, improving the result in terms of BLEU-4 from 38.4 to 39.3.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Jan 9, 2023
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Full-Memory Transformer for Image Captioning

Abstract

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo ... Peng Yao
-
Longteng Guo, et. al.Longteng Guo ... Peng Yao
01 Jun 2020
01 Jun 2020

Multi-Gate Attention Network for Image Captioning
Weitao Jiang ... Bohong Liu
IEEE Access | VOL. 9
Weitao Jiang, et. al.Weitao Jiang ... Bohong Liu
01 Jan 2020
IEEE Access | VOL. 9

Joint Scence Network and Attention-Guided for Image Captioning
Dongming Zhou ... Jing Yang
-
Dongming Zhou, et. al.Dongming Zhou ... Jing Yang
01 Dec 2021
01 Dec 2021

Automated Image Captioning with Multi-layer Gated Recurrent Unit
Ozge Taylan Moral ... Volkan Kilic
-
Ozge Taylan Moral, et. al.Ozge Taylan Moral ... Volkan Kilic
29 Aug 2022
29 Aug 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Full-Memory Transformer for Image Captioning

Abstract

Talk to us

Similar Papers

More From: Symmetry