Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

Spencer Whitehead,Trevor Darrell,Vedaad Shakib,Suzanne Petryk,Anna Rohrbach,Joseph Gonzalez,Marcus Rohrbach

doi:10.1007/978-3-031-20059-5_9

Abstract

AbstractMachine learning has advanced dramatically, narrowing the accuracy gap to humans in multimodal tasks like visual question answering (VQA). However, while humans can say “I don’t know” when they are uncertain (i.e., abstain from answering a question), such ability has been largely neglected in multimodal research, despite the importance of this problem to the usage of VQA in real settings. In this work, we promote a problem formulation for reliable VQA, where we prefer abstention over providing an incorrect answer. We first enable abstention capabilities for several VQA models, and analyze both their coverage, the portion of questions answered, and risk, the error on that portion. For that, we explore several abstention approaches. We find that although the best performing models achieve over 71% accuracy on the VQA v2 dataset, introducing the option to abstain by directly using a model’s softmax scores limits them to answering less than 8% of the questions to achieve a low risk of error (i.e., 1%). This motivates us to utilize a multimodal selection function to directly estimate the correctness of the predicted answers, which we show can increase the coverage by, for example, \(2.4\times \) from 6.8% to 16.3% at 1% risk. While it is important to analyze both coverage and risk, these metrics have a trade-off which makes comparing VQA models challenging. To address this, we also propose an Effective Reliability metric for VQA that places a larger cost on incorrect answers compared to abstentions. This new problem formulation, metric, and analysis for VQA provide the groundwork for building effective and reliable VQA models that have the self-awareness to abstain if and only if they don’t know the answer. Code and Models: https://github.com/facebookresearch/reliable_vqa.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Can Pre-training help VQA with Lexical Variations?
Shailza Jolly ... Shubham Kapoor
-
Shailza Jolly, et. al.Shailza Jolly ... Shubham Kapoor
01 Jan 2020
01 Jan 2020

Adversarial Sample Synthesis for Visual Question Answering
Chuanhao Li ... Zhen Li
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -
Chuanhao Li, et. al.Chuanhao Li ... Zhen Li
16 Sep 2024
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal ... Dhruv Batra
-
Aishwarya Agrawal, et. al.Aishwarya Agrawal ... Dhruv Batra
01 Jun 2018
01 Jun 2018

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding.
Qingxing Cao ... Keze Wang
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Qingxing Cao, et. al.Qingxing Cao ... Keze Wang
01 Jan 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

Abstract

Talk to us

Similar Papers