Enhancing robust VQA via contrastive and self-supervised learning

Runlin Cao,Zhixin Li,Zhenjun Tang,Canlong Zhang,Huifang Ma

doi:10.1016/j.patcog.2024.111129

Abstract

Visual Question Answering (VQA) aims to evaluate the reasoning abilities of an intelligent agent using visual and textual information. However, recent research indicates that many VQA models rely primarily on learning the correlation between questions and answers in the training dataset rather than demonstrating actual reasoning ability. To address this limitation, we propose a novel training approach called Enhancing Robust VQA via Contrastive and Self-supervised Learning (CSL-VQA) to construct a more robust VQA model. Our approach involves generating two types of negative samples to balance the biased data, using self-supervised auxiliary tasks to help the base VQA model overcome language priors, and filtering out biased training samples. In addition, we construct positive samples by removing spurious correlations in biased samples and perform auxiliary training through contrastive learning. Our approach does not require additional annotations and is compatible with different VQA backbones. Experimental results demonstrate that CSL-VQA significantly outperforms current state-of-the-art approaches, achieving an accuracy of 62.30% on the VQA-CP v2 dataset, while maintaining robust performance on the in-distribution VQA v2 dataset. Moreover, our method shows superior generalization capabilities on challenging datasets such as GQA-OOD and VQA-CE, proving its effectiveness in reducing language bias and enhancing the overall robustness of VQA models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancing robust VQA via contrastive and self-supervised learning

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Similar Papers

Forget to Learn (F2L): Circumventing plasticity–stability trade-off in continuous unsupervised domain adaptation
Mohamed Abubakr Hassan ... Chi-Guhn Lee
Pattern Recognition | VOL. 159
Mohamed Abubakr Hassan, et. al.Mohamed Abubakr Hassan ... Chi-Guhn Lee
06 Nov 2024
Pattern Recognition | VOL. 159

Cross-modal adapter for vision–language retrieval
Haojun Jiang ... Gao Huang
Pattern Recognition | VOL. 159
Haojun Jiang, et. al.Haojun Jiang ... Gao Huang
03 Nov 2024
Pattern Recognition | VOL. 159

L2T-DFM: Learning to Teach with Dynamic Fused Metric
Zhaoyang Hai ... Mengqiao Han
Pattern Recognition | VOL. 159
Zhaoyang Hai, et. al.Zhaoyang Hai ... Mengqiao Han
02 Nov 2024
Pattern Recognition | VOL. 159

Consistency-driven feature scoring and regularization network for visible–infrared person re-identification
Xueting Chen ... Hanzi Wang
Pattern Recognition | VOL. 159
Xueting Chen, et. al.Xueting Chen ... Hanzi Wang
02 Nov 2024
Pattern Recognition | VOL. 159

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing robust VQA via contrastive and self-supervised learning

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition