Social Image Captioning: Exploring Visual Attention and User Attention.

Leiquan Wang,Yiwei Wei,Xiaoliang Chu,Chunlei Wu,Weishan Zhang,Weichen Sun

doi:10.3390/s18020646

Abstract

Image captioning with a natural language has been an emerging trend. However, the social image, associated with a set of user-contributed tags, has been rarely investigated for a similar task. The user-contributed tags, which could reflect the user attention, have been neglected in conventional image captioning. Most existing image captioning models cannot be applied directly to social image captioning. In this work, a dual attention model is proposed for social image captioning by combining the visual attention and user attention simultaneously.Visual attention is used to compress a large mount of salient visual information, while user attention is applied to adjust the description of the social images with user-contributed tags. Experiments conducted on the Microsoft (MS) COCO dataset demonstrate the superiority of the proposed method of dual attention.

Highlights

Image caption generation is a hot topic in computer vision and machine learning
We propose a novel dual attention model (DAM) to explore the social image captioning based on visual attention and user attention
The image is commonly represented by a convolutional neural network (CNN) feature vector as the encoder, and the decoder part is usually modeled with recurrent neural networks (RNN)

Summary

Introduction

Image caption generation is a hot topic in computer vision and machine learning. Rapid development and great progress have been made in this area with deep learning recently. “Soft visual attention” [4] is proposed by Xu, where only visual features are used to generate image captions (see Figure 2a). We propose a novel dual attention model (DAM) to explore the social image captioning based on visual attention and user attention (see Figure 2c). Social image captioning is considered to generate diverse descriptions with corresponding user tags. User attention is proposed to address the different effects of generated visual descriptions and user tags, which lead to a personalized social image caption. A dual attention model is proposed for social image captioning to combine the visual attention and user attention simultaneously. In this situation, generated descriptions maintain accuracy and diversity

Related Work

Preliminaries

Dual Attention Model Architecture

Visual Attention

User Attention

Combination of Visual and User Attentions

Datasets and Evaluation Metrics

Overall Comparisons by Using Visual Attributes

Overall Comparison by Using Man-Made User Tags

The Influence of Noise on the Dual Attention Model

Qualitative Analysis

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: Feb 22, 2018
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Social Image Captioning: Exploring Visual Attention and User Attention.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

Image caption generation using a dual attention mechanism
Roshni Padate ... Arvind Sharma
Engineering Applications of Artificial Intelligence | VOL. 123
Roshni Padate, et. al.Roshni Padate ... Arvind Sharma
29 Mar 2023
Engineering Applications of Artificial Intelligence | VOL. 123

Deep Learning in Natural Language Generation from Images
Xiaodong He ... Li Deng
-
Xiaodong He, et. al.Xiaodong He ... Li Deng
01 Jan 2018
01 Jan 2018

An Image Captioning Algorithm Based on Combination Attention Mechanism
Jinlong Liu ... Zhilu Wu
Electronics | VOL. 11
Jinlong Liu, et. al.Jinlong Liu ... Zhilu Wu
27 Apr 2022
Electronics | VOL. 11

Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Zheng-Jun Zha ... Hanwang Zhang
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44
Zheng-Jun Zha, et. al.Zheng-Jun Zha ... Hanwang Zhang
09 Apr 2019
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Social Image Captioning: Exploring Visual Attention and User Attention.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors