Robust Visual Question Answering: Datasets, Methods, and Future Challenges.

Jie Ma,Hongbin Pei,Pinghui Wang,Dechen Kong,Zewei Wang,Jun Liu,Junzhou Zhao

doi:10.1109/tpami.2024.3366154

Abstract

Visual question answering requires a system to provide an accurate natural language answer given an image and a natural language question. However, it is widely recognized that previous generic VQA methods often tend to memorize biases present in the training data rather than learning proper behaviors, such as grounding images before predicting answers. Therefore, these methods usually achieve high in-distribution but poor out-of-distribution performance. In recent years, various datasets and debiasing methods have been proposed to evaluate and enhance the VQA robustness, respectively. This paper provides the first comprehensive survey focused on this emerging fashion. Specifically, we first provide an overview of the development process of datasets from in-distribution and out-of-distribution perspectives. Then, we examine the evaluation metrics employed by these datasets. Third, we propose a typology that presents the development process, similarities and differences, robustness comparison, and technical features of existing debiasing methods. Furthermore, we analyze and discuss the robustness of representative vision-and-language pre-training models on VQA. Finally, through a thorough review of the available literature and experimental analysis, we discuss the key areas for future research from various viewpoints.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Robust Visual Question Answering: Datasets, Methods, and Future Challenges.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence

Lead the way for us

Similar Papers

Estimating Viewed Images with Natural Language Question Answering from fMRI Data
Saya Takada ... Takahiro Ogawa
-
Saya Takada, et. al.Saya Takada ... Takahiro Ogawa
01 Mar 2020
01 Mar 2020

Visual Question Answering as Reading Comprehension
Hui Li ... Anton Van Den Hengel
-
Hui Li, et. al.Hui Li ... Anton Van Den Hengel
01 Jun 2019
01 Jun 2019

Visual Question Answering
Dr Sai Madhavi D ... Manasa A
International Journal of Advanced Research in Science, Communication and Technology | VOL. -
Dr Sai Madhavi D, et. al. Dr Sai Madhavi D ... Manasa A
13 Jul 2022
International Journal of Advanced Research in Science, Communication and Technology | VOL. -

VQA: Visual Question Answering
Aishwarya Agrawal ... Stanislaw Antol
International Journal of Computer Vision | VOL. 123
Aishwarya Agrawal, et. al.Aishwarya Agrawal ... Stanislaw Antol
08 Nov 2016
International Journal of Computer Vision | VOL. 123

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust Visual Question Answering: Datasets, Methods, and Future Challenges.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on pattern analysis and machine intelligence