Visual Question Answering as Reading Comprehension

Hui Li,Peng Wang,Chunhua Shen,Anton Van Den Hengel

doi:10.1109/cvpr.2019.00648

Abstract

Visual question answering (VQA) demands simultaneous comprehension of both the image visual content and natural language questions. In some cases, the reasoning needs the help of common sense or general knowledge which usually appear in the form of text. Current methods jointly embed both the visual information and the textual feature into the same space. However, how to model the complex interactions between the two different modalities is not an easy task. In contrast to struggling on multimodal feature fusion, in this paper, we propose to unify all the input information by natural language so as to convert VQA into a machine reading comprehension problem. With this transformation, our method not only can tackle VQA datasets that focus on observation based questions, but can also be naturally extended to handle knowledge-based VQA which requires to explore large-scale external knowledge base. It is a step towards being able to exploit large volumes of text and natural language processing techniques to address VQA problem. Two types of models are proposed to deal with open-ended VQA and multiple-choice VQA respectively. We evaluate our models on three VQA benchmarks. The comparable performance with the state-of-the-art demonstrates the effectiveness of the proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Visual Question Answering as Reading Comprehension

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Multimodal feature fusion by relational reasoning and attention for visual question answering
Weifeng Zhang ... Zengchang Qin
Information Fusion | VOL. 55
Weifeng Zhang, et. al.Weifeng Zhang ... Zengchang Qin
19 Aug 2019
Information Fusion | VOL. 55

Estimating Viewed Images with Natural Language Question Answering from fMRI Data
Saya Takada ... Takahiro Ogawa
-
Saya Takada, et. al.Saya Takada ... Takahiro Ogawa
01 Mar 2020
01 Mar 2020

ConceptBert: Concept-Aware Representation for Visual Question Answering
François Gardères ... Freddy Lecue
-
François Gardères, et. al.François Gardères ... Freddy Lecue
01 Jan 2020
01 Jan 2020

VQA: Visual Question Answering
Aishwarya Agrawal ... Stanislaw Antol
International Journal of Computer Vision | VOL. 123
Aishwarya Agrawal, et. al.Aishwarya Agrawal ... Stanislaw Antol
08 Nov 2016
International Journal of Computer Vision | VOL. 123

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Visual Question Answering as Reading Comprehension

Abstract

Talk to us

Similar Papers