Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation

Arijit Ray,Ajay Divakaran,Stefan Lee,Karan Sikka,Giedrius Burachas

doi:10.18653/v1/d19-1596

Abstract

While models for Visual Question Answering (VQA) have steadily improved over the years, interacting with one quickly reveals that these models lack consistency. For instance, if a model answers “red” to “What color is the balloon?”, it might answer “no” if asked, “Is the balloon red?”. These responses violate simple notions of entailment and raise questions about how effectively VQA models ground language. In this work, we introduce a dataset, ConVQA, and metrics that enable quantitative evaluation of consistency in VQA. For a given observable fact in an image (e.g. the balloon’s color), we generate a set of logically consistent question-answer (QA) pairs (e.g. Is the balloon red?) and also collect a human-annotated set of common-sense based consistent QA pairs (e.g. Is the balloon the same color as tomato sauce?). Further, we propose a consistency-improving data augmentation module, a Consistency Teacher Module (CTM). CTM automatically generates entailed (or similar-intent) questions for a source QA pair and fine-tunes the VQA model if the VQA’s answer to the entailed question is consistent with the source QA pair. We demonstrate that our CTM-based training improves the consistency of VQA models on the Con-VQA datasets and is a strong baseline for further research.

Highlights

Visual Question Answering (VQA) (Antol et al, 2015) involves answering natural language questions about images
To improve the consistency of VQA models, we propose a Consistency Teacher Module (CTM), which consists of a Question Generator that synthesizes entailed questions given a seed QA pair and a Consistency Checker that examines whether answers to those similar-intent questions are consistent
We demonstrate that our approach improves the performance of a baseline VQA model on our ConVQA testing sets in terms of both accuracy and consistency

Summary

Introduction

Visual Question Answering (VQA) (Antol et al, 2015) involves answering natural language questions about images. Consistent Question-Answer (QA) pairs can be derived based on simple notions of logic or by commonsense reasoning. If an image contains “vegetarian pizza”, commonsense-based QA pairs can be “is it a vegetarian pizza? While attempts have been made to construct logic-based consistent VQA datasets (Hudson and Manning, 2019), they still fall short on commonsense-based consistency. To improve the consistency of VQA models, we propose a Consistency Teacher Module (CTM), which consists of a Question Generator that synthesizes entailed (or similar-intent) questions given a seed QA pair and a Consistency Checker that examines whether answers to those similar-intent questions are consistent. Our datasets and models will be available at https://bit.ly/32exlM7

Related Work

ConVQA Datasets

Approach

Experiments

Results and Analysis

Conclusion and Discussion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 57	License type: cc-by

Similar Papers

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding.
Qingxing Cao ... Liang Lin
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Qingxing Cao, et. al.Qingxing Cao ... Liang Lin
01 Jan 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

Can Pre-training help VQA with Lexical Variations?
Shailza Jolly ... Shubham Kapoor
-
Shailza Jolly, et. al.Shailza Jolly ... Shubham Kapoor
01 Jan 2020
01 Jan 2020

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal ... Dhruv Batra
-
Aishwarya Agrawal, et. al.Aishwarya Agrawal ... Dhruv Batra
01 Jun 2018
01 Jun 2018

Adversarial Sample Synthesis for Visual Question Answering
Chuanhao Li ... Yuwei Wu
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -
Chuanhao Li, et. al.Chuanhao Li ... Yuwei Wu
16 Sep 2024
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation

Abstract

Highlights

Summary

Talk to us

Similar Papers