On the role of question encoder sequence model in robust visual question answering

Gouthaman Kv,Anurag Mittal

doi:10.1016/j.patcog.2022.108883

Abstract

Generalizing beyond the experiences has a significant role in developing robust and practical machine learning systems. It has been shown that current Visual Question Answering (VQA) models are over-dependent on the language-priors (spurious correlations between question-types and their most frequent answers) from the train set and pose poor performance on Out-of-Distribution (OOD) test sets. This conduct negatively affects the robustness of VQA models and restricts them from being utilized in real-world situations. This paper shows that the sequence model architecture used in the question-encoder has a significant role in the OOD performance of VQA models. To demonstrate this, we performed a detailed analysis of various existing RNN-based and Transformer-based question-encoders, and along, we proposed a novel Graph attention network (GAT)-based question-encoder. Our study found that a better choice of sequence model in the question-encoder reduces the over-fit to language biases and improves OOD performance in VQA even without using any additional relatively complex bias-mitigation approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On the role of question encoder sequence model in robust visual question answering

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Journal: Pattern Recognition	Publication Date: Jul 3, 2022
Citations: 1

Similar Papers

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal ... Dhruv Batra
-
Aishwarya Agrawal, et. al.Aishwarya Agrawal ... Dhruv Batra
01 Jun 2018
01 Jun 2018

Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding.
Qingxing Cao ... Liang Lin
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33
Qingxing Cao, et. al.Qingxing Cao ... Liang Lin
01 Jan 2020
IEEE Transactions on Neural Networks and Learning Systems | VOL. 33

Can Pre-training help VQA with Lexical Variations?
Shailza Jolly ... Shubham Kapoor
-
Shailza Jolly, et. al.Shailza Jolly ... Shubham Kapoor
01 Jan 2020
01 Jan 2020

Adversarial Sample Synthesis for Visual Question Answering
Chuanhao Li ... Yuwei Wu
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -
Chuanhao Li, et. al.Chuanhao Li ... Yuwei Wu
16 Sep 2024
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the role of question encoder sequence model in robust visual question answering

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition