VQA: Visual Question Answering

Aishwarya Agrawal,Dhruv Batra,Jiasen Lu,Margaret Mitchell,Devi Parikh,Stanislaw Antol,C Lawrence Zitnick

doi:10.1007/s11263-016-0966-6

Abstract

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are open-ended. Visual questions selectively target different areas of an image, including background details and underlying context. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Moreover, VQA is amenable to automatic evaluation, since many open-ended answers contain only a few words or a closed set of answers that can be provided in a multiple-choice format. We provide a dataset containing $$\sim $$~0.25 M images, $$\sim $$~0.76 M questions, and $$\sim $$~10 M answers (www.visualqa.org), and discuss the information it provides. Numerous baselines and methods for VQA are provided and compared with human performance. Our VQA demo is available on CloudCV (http://cloudcv.org/vqa).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

VQA: Visual Question Answering

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Vision

Lead the way for us

Journal: International Journal of Computer Vision	Publication Date: Nov 8, 2016
Citations: 297

Similar Papers

VQA: Visual Question Answering
Stanislaw Antol ... Margaret Mitchell
-
Stanislaw Antol, et. al.Stanislaw Antol ... Margaret Mitchell
01 Dec 2015
01 Dec 2015

Visual Question Answering
Dr Sai Madhavi D ... Pooja U Joshi
International Journal of Advanced Research in Science, Communication and Technology | VOL. -
Dr Sai Madhavi D, et. al. Dr Sai Madhavi D ... Pooja U Joshi
13 Jul 2022
International Journal of Advanced Research in Science, Communication and Technology | VOL. -

Estimating Viewed Images with Natural Language Question Answering from fMRI Data
Saya Takada ... Ren Togo
-
Saya Takada, et. al.Saya Takada ... Ren Togo
01 Mar 2020
01 Mar 2020

Visual Question Answering as Reading Comprehension
Hui Li ... Anton Van Den Hengel
-
Hui Li, et. al.Hui Li ... Anton Van Den Hengel
01 Jun 2019
01 Jun 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VQA: Visual Question Answering

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Vision