The role of image representations in vision to language tasks

Pranava Madhyastha,Lucia Specia,Josiah Wang

doi:10.1017/s1351324918000116

Pranava Madhyastha, Lucia Specia + Show 1 more

Open Access

https://doi.org/10.1017/s1351324918000116

Copy DOI

Journal: Natural Language Engineering	Publication Date: Mar 21, 2018
Citations: 3	License type: cc-by-nc-nd

Affiliation: University of Sheffield

Abstract

AbstractTasks that require modeling of both language and visual information, such as image captioning, have become very popular in recent years. Most state-of-the-art approaches make use of image representations obtained from a deep neural network, which are used to generate language information in a variety of ways with end-to-end neural-network-based models. However, it is not clear how different image representations contribute to language generation tasks. In this paper, we probe the representational contribution of the image features in an end-to-end neural modeling framework and study the properties of different types of image representations. We focus on two popular vision to language problems: The task of image captioning and the task of multimodal machine translation. Our analysis provides interesting insights into the representational properties and suggests that end-to-end approaches implicitly learn a visual-semantic subspace and exploit the subspace to generate captions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The role of image representations in vision to language tasks

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering

Lead the way for us

Similar Papers

Iconographic Image Captioning for Artworks
Eva Cetinic
-
Eva CetinicEva Cetinic
01 Jan 2020
01 Jan 2020

Eye-movement-prompted large image captioning model
Zheng Yang ... Zhi-Hui Zhan
Pattern Recognition | VOL. -
Zheng Yang, et. al.Zheng Yang ... Zhi-Hui Zhan
01 Nov 2024
Pattern Recognition | VOL. -

Context-Aware Visual Policy Network for Fine-Grained Image Captioning
Zheng-Jun Zha ... Yongdong Zhang
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44
Zheng-Jun Zha, et. al.Zheng-Jun Zha ... Yongdong Zhang
09 Apr 2019
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning
Long Chen ... Hanwang Zhang
-
Long Chen, et. al.Long Chen ... Hanwang Zhang
01 Jul 2017
01 Jul 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The role of image representations in vision to language tasks

Abstract

Talk to us

Similar Papers

More From: Natural Language Engineering