Abstract

Visual question answering (VQA) is a challenging task that reasons over questions on images with knowledge. A prerequisite for VQA is the availability of annotated datasets, while the available datasets have several limitations. 1) The diversity of questions and answers are limited to a few question categories and certain concepts (e.g., objects, relations, actions.) with somewhat mechanical answers. 2) The availability of background knowledge or context information has been disregarded with just images, questions and answers being provided. 3) The timeliness of knowledge has not been examined, though some works may introduce factual or commonsense knowledge bases, e.g., ConceptNet, DBPedia. In this paper, we provide an Event-oriented Visual Question Answering (E-VQA) dataset including free-form questions and answers for real-world event concepts, which provides context information of events as domain knowledge in addition to images. E-VQA consists of 2,690 social media images, 9,088 questions, 5,479 answers, and 1,157 news media articles for references being annotated to 182 real-world events, covering a wide range of topics, such as armed conflicts and attacks, disasters and accidents, law and crime. For comparisons, we investigate 10 state-of-the-art VQA methods as benchmarks. The dataset is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/zhengyang5/E-VQA</uri> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call