Attention-Guided Image Captioning through Word Information.

Ziwei Tang,Hao Sheng,Yaohua Yi

doi:10.3390/s21237982

Abstract

Image captioning generates written descriptions of an image. In recent image captioning research, attention regions seldom cover all objects, and generated captions may lack the details of objects and may remain far from reality. In this paper, we propose a word guided attention (WGA) method for image captioning. First, WGA extracts word information using the embedded word and memory cell by applying transformation and multiplication. Then, WGA applies word information to the attention results and obtains the attended feature vectors via elementwise multiplication. Finally, we apply WGA with the words from different time steps to obtain previous word guided attention (PW) and current word attention (CW) in the decoder. Experiments on the MSCOCO dataset show that our proposed WGA can achieve competitive performance against state-of-the-art methods, with PW results of a 39.1 Bilingual Evaluation Understudy score (BLEU-4) and a 127.6 Consensus-Based Image Description Evaluation score (CIDEr-D); and CW results of a 39.1 BLEU-4 score and a 127.2 CIDER-D score on a Karpathy test split.

Highlights

Image captioning is synthetic research that spans computer vision and natural language processing to generate natural descriptions of images
Inspired by attention mechanisms [8] and sequence–sequence models [9] exploited in machine translation tasks, an encoder–decoder framework [10,11,12,13,14] has been widely used for image captioning
We propose a novel attention guided by word information (WGA) for image captioning, which is aimed at extracting more valuable information from the images

Summary

Introduction

Image captioning is synthetic research that spans computer vision and natural language processing to generate natural descriptions of images. Inspired by attention mechanisms [8] and sequence–sequence models [9] exploited in machine translation tasks, an encoder–decoder framework [10,11,12,13,14] has been widely used for image captioning In such a framework, images are encoded to feature vectors by a pretrained image classification model, object detection model, or semantic segmentation model, and decoded to words via an RNN. To address this issue, we propose word guided attention (WGA), which is created from word information, to bring novel specific guidance to the decoder. The information processing method includes memory cell weighting, embedded words, and basic attention Based on this process, we construct a WGA module in the decoder. With the current step word, the WGA is devoted to obtaining more details and deeper relation information from the current attention region

Image Captioning

Attention Mechanism

Methods

Image Captioning Model

Training and Objectives

Dataset

Implementation Details

Quantitative Analysis

Method

Qualitative Analysis

Ablative Studies

Findings

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Nov 30, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Attention-Guided Image Captioning through Word Information.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

Empirical Study of Image Captioning Models Using Various Deep Learning Encoders
Gaurav ... Pratistha Mathur
-
Gaurav, et. al. Gaurav ... Pratistha Mathur
01 Jan 2023
01 Jan 2023

Does BLEU Score Work for Code Migration?
Ngoc Tran ... Son Nguyen
-
Ngoc Tran, et. al.Ngoc Tran ... Son Nguyen
01 May 2019
01 May 2019

Phrase-Based Named Entity Transliteration on Myanmar-English Terminology Dictionary
Aye Myat Mon ... Khin Mar Soe
-
Aye Myat Mon, et. al.Aye Myat Mon ... Khin Mar Soe
05 Nov 2020
05 Nov 2020

Image Captioning using Hybrid of VGG16 and Bidirectional LSTM Model
Yufis Azhar ... Muhammad Al Reza Fahlopy
Kinetik | VOL. -
Yufis Azhar, et. al.Yufis Azhar ... Muhammad Al Reza Fahlopy
10 Nov 2022
Kinetik | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Attention-Guided Image Captioning through Word Information.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)