A robust visual question answering approach to reduce multimodal bias

Zhang Fengshuo,Li Yu,Li Xiangqian,Xu Jinan,Chen Yufeng

doi:10.59782/sidr.v4i1.84

Abstract

Currently, many visual question answering models have bias problems. Specifically, when the question-answer relationship in the training data shows a more obvious mapping relationship, the model shows poor generalization ability. For such biased predictions, existing research work mainly considers language bias, while ignoring the bias information introduced by images. In order to enhance the robustness of visual question answering models, a bias reduction method is proposed, and on this basis, the influence of language and visual information on bias is explored. Furthermore, two bias learning branches are constructed to capture language bias and the bias caused by language and images respectively, and the bias reduction method is used to obtain more robust prediction results. Finally, according to the difference in prediction probability between the standard visual question answering and bias branches, the samples are dynamically weighted, so that the model can dynamically adjust the learning degree for samples with different bias levels. Experiments on datasets such as VQA-CP v2.0 prove the effectiveness of the proposed method and alleviate the influence of bias on the model.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

A robust visual question answering approach to reduce multimodal bias

Abstract

Published Version

Talk to us

Similar Papers

More From: Scientific Insights and Discoveries Review

Lead the way for us

Journal: Scientific Insights and Discoveries Review	Publication Date: Oct 14, 2024
License type: CC BY-NC-ND 4.0

Similar Papers

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal ... Dhruv Batra
-
Aishwarya Agrawal, et. al.Aishwarya Agrawal ... Dhruv Batra
01 Jun 2018
01 Jun 2018

Can Pre-training help VQA with Lexical Variations?
Shailza Jolly ... Shubham Kapoor
-
Shailza Jolly, et. al.Shailza Jolly ... Shubham Kapoor
01 Jan 2020
01 Jan 2020

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding.
Qingxing Cao ... Keze Wang
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Qingxing Cao, et. al.Qingxing Cao ... Keze Wang
01 Jan 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

From Superficial to Deep: Language Bias driven Curriculum Learning for Visual Question Answering
Mingrui Lao ... Wei Chen
-
Mingrui Lao, et. al.Mingrui Lao ... Wei Chen
17 Oct 2021
17 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

A robust visual question answering approach to reduce multimodal bias

Abstract

Published Version

Talk to us

Similar Papers

More From: Scientific Insights and Discoveries Review