Multi-stage reasoning on introspecting and revising bias for visual question answering

L An-An,Lu Zimu,Lv Bo,Zheng Bolun,Shao Zhuang,Duan Yulong,Xu Ning,Li Xuanya,Yan Chenggang,Liu Min

doi:10.1145/3616399

Abstract

Visual Question Answering (VQA) is a task that involves predicting an answer to a question depending on the content of an image. However, recent VQA methods have relied more on language priors between the question and answer rather than the image content. To address this issue, many debiasing methods have been proposed to reduce language bias in model reasoning. However, the bias can be divided into two categories: good bias and bad bias. Good bias can benefit to the answer prediction, while the bad bias may associate the models with the unrelated information. Therefore, instead of excluding good and bad bias indiscriminately in existing debiasing methods, we proposed a bias discrimination module to distinguish them. Additionally, bad bias may reduce the model’s reliance on image content during answer reasoning and thus attend little on image features updating. To tackle this, we leverage Markov theory to construct a Markov field with image regions and question words as nodes. This helps with feature updating for both image regions and question words, thereby facilitating more accurate and comprehensive reasoning about both the image content and question. To verify the effectiveness of our network, we evaluate our network on VQA v2 and VQA cp v2 datasets and conduct extensive quantity and quality studies to verify the effectiveness of our proposed network. Experimental resu- lts show that our network achieves significant performance against the previous state-of-the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi-stage reasoning on introspecting and revising bias for visual question answering

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on the Web

Lead the way for us

Journal: ACM Transactions on the Web	Publication Date: Oct 8, 2024
Citations: 1

Similar Papers

Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering.
Zihan Guo ... Dezhi Han
Sensors (Basel, Switzerland) | VOL. 20
Zihan Guo, et. al.Zihan Guo ... Dezhi Han
26 Nov 2020
Sensors (Basel, Switzerland) | VOL. 20

Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
Yibing Liu ... Jianhua Yin
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 18
Yibing Liu, et. al.Yibing Liu ... Jianhua Yin
04 Mar 2022
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 18

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal ... Tejas Khot
International Journal of Computer Vision | VOL. 127
Yash Goyal, et. al.Yash Goyal ... Tejas Khot
11 Sep 2018
International Journal of Computer Vision | VOL. 127

Positional Attention Guided Transformer-Like Architecture for Visual Question Answering
Aihua Mao ... Jun Xuan
IEEE Transactions on Multimedia | VOL. 25
Aihua Mao, et. al.Aihua Mao ... Jun Xuan
01 Jan 2023
IEEE Transactions on Multimedia | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-stage reasoning on introspecting and revising bias for visual question answering

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on the Web