Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm

Mehrdad Alizadeh,Barbara Di Eugenio

doi:10.1142/s1793351x20400085

Abstract

Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQAsub). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQAsub as well. The results show a slight improvement over the single-task CNN-LSTM model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm

Abstract

Talk to us

Similar Papers

More From: International Journal of Semantic Computing

Lead the way for us

Similar Papers

Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach
Mehrdad Alizadeh ... Barbara Di Eugenio
-
Mehrdad Alizadeh, et. al.Mehrdad Alizadeh ... Barbara Di Eugenio
01 Feb 2020
01 Feb 2020

Multi-view Visual Question Answering Dataset for Real Environment Applications
Yue Qiu ... Kenji Iwata
-
Yue Qiu, et. al.Yue Qiu ... Kenji Iwata
01 Jan 2020
01 Jan 2020

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal ... Douglas Summers-Stay
International Journal of Computer Vision | VOL. 127
Yash Goyal, et. al.Yash Goyal ... Douglas Summers-Stay
11 Sep 2018
International Journal of Computer Vision | VOL. 127

VizWiz Grand Challenge: Answering Visual Questions from Blind People
Danna Gurari ... Chi Lin
-
Danna Gurari, et. al.Danna Gurari ... Chi Lin
01 Jun 2018
01 Jun 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm

Abstract

Talk to us

Similar Papers

More From: International Journal of Semantic Computing