Image Captioning with Dense Fusion Connection and Improved Stacked Attention Module

Hegui Zhu,Ru Wang,Xiangde Zhang

doi:10.1007/s11063-021-10431-y

Abstract

In the existing image captioning methods, masked convolution is usually used to generate language description, and traditional residual network (ResNets) methods used for masked convolution bring about the vanishing gradient problem. To address this issue, we propose a new image captioning framework that combines dense fusion connection (DFC) and improved stacked attention module. DFC uses dense convolutional networks (DenseNets) architecture to connect each layer to any other layer in a feed-forward fashion, then adopts ResNets method to combine features through summation. The improved stacked attention module can capture more fine-grained visual information highly relevant to the word prediction. Finally, we employ the Transformer to the image encoder to sufficiently obtain the attended image representation. The experimental results on MS-COCO dataset demonstrate the proposed model can increase CIDEr score from $$91.2 \%$$ to $$106.1 \%$$ , which has higher performance than the comparable models and verifies the effectiveness of the proposed model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Image Captioning with Dense Fusion Connection and Improved Stacked Attention Module

Abstract

Talk to us

Similar Papers

More From: Neural Processing Letters

Lead the way for us

Journal: Neural Processing Letters	Publication Date: Feb 6, 2021
Citations: 7

Similar Papers

Hyperspectral image classification with dual attention dense residual network
Hongmin Gao ... Chenming Li
International Journal of Remote Sensing | VOL. 42
Hongmin Gao, et. al.Hongmin Gao ... Chenming Li
06 Jun 2021
International Journal of Remote Sensing | VOL. 42

Distilling the Knowledge of Multiscale Densely Connected Deep Networks in Mechanical Intelligent Diagnosis
Xiaochuan Wang ... Yi Gu
Wireless Communications and Mobile Computing | VOL. 2021
Xiaochuan Wang, et. al.Xiaochuan Wang ... Yi Gu
01 Jan 2020
Wireless Communications and Mobile Computing | VOL. 2021

Multi-scale temporal feature-based dense convolutional network for action recognition
Xiaoqiang Li ... Miao Xie
Journal of Electronic Imaging | VOL. 29
Xiaoqiang Li, et. al.Xiaoqiang Li ... Miao Xie
17 Dec 2020
Journal of Electronic Imaging | VOL. 29

Dual attention dense convolutional network for intelligent fault diagnosis of spindle-rolling bearings
Su Jiang ... Ruizhen Jing
Journal of Vibration and Control | VOL. 27
Su Jiang, et. al.Su Jiang ... Ruizhen Jing
23 Sep 2020
Journal of Vibration and Control | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Image Captioning with Dense Fusion Connection and Improved Stacked Attention Module

Abstract

Talk to us

Similar Papers

More From: Neural Processing Letters