A Case Study of the Shortcut Effects in Visual Commonsense Reasoning

Keren Ye,Adriana Kovashka

doi:10.1609/aaai.v35i4.16428

Abstract

Visual reasoning and question-answering have gathered attention in recent years. Many datasets and evaluation protocols have been proposed; some have been shown to contain bias that allows models to ``cheat'' without performing true, generalizable reasoning. A well-known bias is dependence on language priors (frequency of answers) resulting in the model not looking at the image. We discover a new type of bias in the Visual Commonsense Reasoning (VCR) dataset. In particular we show that most state-of-the-art models exploit co-occurring text between input (question) and output (answer options), and rely on only a few pieces of information in the candidate options, to make a decision. Unfortunately, relying on such superficial evidence causes models to be very fragile. To measure fragility, we propose two ways to modify the validation data, in which a few words in the answer choices are modified without significant changes in meaning. We find such insignificant changes cause models' performance to degrade significantly. To resolve the issue, we propose a curriculum-based masking approach, as a mechanism to perform more robust training. Our method improves the baseline by requiring it to pay attention to the answers as a whole, and is more effective than prior masking strategies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Case Study of the Shortcut Effects in Visual Commonsense Reasoning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: May 18, 2021
Citations: 11

Similar Papers

Accounting for context: future directions in bioethics theory and research.
Darleen Douglas-Steele ... Edward M Hundert
Theoretical medicine | VOL. 17
Darleen Douglas-Steele, et. al.Darleen Douglas-Steele ... Edward M Hundert
01 Jun 1996
Theoretical medicine | VOL. 17

“Open it Up a Bit”
Richard Light
Journal of Sport and Social Issues | VOL. 25
Richard LightRichard Light
01 Aug 2001
Journal of Sport and Social Issues | VOL. 25

Educational Environments: Narration and Education in Campe, Goethe, and Kleist
Edgar Landgraf
Goethe Yearbook | VOL. 24
Edgar LandgrafEdgar Landgraf
01 Jan 2017
Goethe Yearbook | VOL. 24

Reliability of Stress Radiography for Evaluation of Posterior Knee Laxity
Martin S Schulz ... Kai Russe
The American Journal of Sports Medicine | VOL. 33
Martin S Schulz, et. al.Martin S Schulz ... Kai Russe
01 Apr 2005
The American Journal of Sports Medicine | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Case Study of the Shortcut Effects in Visual Commonsense Reasoning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence