Be flexible! learn to debias by sampling and prompting for robust visual question answering

Jin Liu,Chongfeng Fan,Fengyu Zhou,Huijuan Xu

doi:10.1016/j.ipm.2023.103296

Abstract

Recent studies point out that VQA models tend to rely on the language prior in the training data to answer the questions, which prevents the VQA model from generalization on the out-of-distribution test data. To address this problem, approaches are designed to reduce the language distribution prior effect by constructing negative image–question pairs, while they cannot provide the proper visual reason for answering the question. In this paper, we present a new debiasing framework for VQA by Learning to Sample paired image–question and Prompt for given question (LSP). Specifically, we construct the negative image–question pairs with certain sampling rate to prevent the model from overly relying on the visual shortcut content. Notably, question types provide a strong hint for answering the questions. We utilize question type to constrain the sampling process for negative question–image pairs, and further learn the question type-guided prompt for better question comprehension. Extensive experiments on two public benchmarks, VQA-CP v2 and VQA v2, demonstrate that our model achieves new state-of-the-art results in overall accuracy, i.e., 61.95% and 65.26%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Be flexible! learn to debias by sampling and prompting for robust visual question answering

Abstract

Talk to us

Similar Papers

More From: Information Processing & Management

Lead the way for us

Journal: Information Processing & Management	Publication Date: Feb 2, 2023
Citations: 10

Similar Papers

Multi-scale relation reasoning for multi-modal Visual Question Answering
Yirui Wu ... Shaohua Wan
Signal Processing: Image Communication | VOL. 96
Yirui Wu, et. al.Yirui Wu ... Shaohua Wan
14 May 2021
Signal Processing: Image Communication | VOL. 96

Loss Re-Scaling VQA: Revisiting the Language Prior Problem From a Class-Imbalance View.
Yangyang Guo ... Zhiyong Cheng
IEEE Transactions on Image Processing | VOL. 31
Yangyang Guo, et. al.Yangyang Guo ... Zhiyong Cheng
01 Jan 2021
IEEE Transactions on Image Processing | VOL. 31

Plenty is Plague: Fine-Grained Learning for Visual Question Answering.
Yiyi Zhou ... Rongrong Ji
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44
Yiyi Zhou, et. al.Yiyi Zhou ... Rongrong Ji
29 Nov 2019
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 44

Detection of Phishing Webpages Using Heterogeneous Transfer Learning
Karl R Weiss ... Taghi M Khoshgoftaar
-
Karl R Weiss, et. al.Karl R Weiss ... Taghi M Khoshgoftaar
01 Oct 2017
01 Oct 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Be flexible! learn to debias by sampling and prompting for robust visual question answering

Abstract

Talk to us

Similar Papers

More From: Information Processing &amp; Management

More From: Information Processing & Management