Robust visual question answering via polarity enhancement and contrast

Dahe Peng,Zhixin Li

doi:10.1016/j.neunet.2024.106560

Abstract

The Visual Question Answering (VQA) task is an important research direction in the field of artificial intelligence, which requires a model that can simultaneously understand visual images and natural language questions, and answer questions related to images. Recent studies have shown that many Visual Question Answering models rely on statistically regular correlations between questions and answers, which in turn weakens the correlation between visual content and textual information. In this work, we propose an unbiased Visual Question Answering method to solve language priors from the perspective of strengthening the contrast between the correct answer and the positive and negative predictions. We design a new model consisting of two modules with different roles. We input the image and the question corresponding to it into the Answer Visual Attention Modules to generate positive prediction output, and then use a Dual Channels Joint Module to generate negative prediction output with great linguistic prior knowledge. Finally, we input the positive and negative predictions together with the correct answer to our newly designed loss function for training. Our method achieves high performance (61.24%) on the VQA-CP v2 dataset. In addition, most existing debiasing methods improve performance on VQA-CP v2 dataset at the cost of reducing performance on VQA v2 dataset, while our method not only does not reduce the accuracy on VQA v2 dataset. Instead, it improves performance on both datasets mentioned above.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Robust visual question answering via polarity enhancement and contrast

Abstract

Talk to us

Similar Papers

More From: Neural Networks

Lead the way for us

Similar Papers

Visual Question Answering Using Deep Learning: A Survey and Performance Analysis
Yash Srivastava ... Vaishnav Murali
-
Yash Srivastava, et. al.Yash Srivastava ... Vaishnav Murali
01 Jan 2020
01 Jan 2020

Visual Question Answering as Reading Comprehension
Hui Li ... Anton Van Den Hengel
-
Hui Li, et. al.Hui Li ... Anton Van Den Hengel
01 Jun 2019
01 Jun 2019

VQA: Visual Question Answering
Aishwarya Agrawal ... Stanislaw Antol
International Journal of Computer Vision | VOL. 123
Aishwarya Agrawal, et. al.Aishwarya Agrawal ... Stanislaw Antol
08 Nov 2016
International Journal of Computer Vision | VOL. 123

Visual Question Answering: Methodologies and Challenges
Liyana Sahir Kallooriyakath ... Adith P P
-
Liyana Sahir Kallooriyakath, et. al.Liyana Sahir Kallooriyakath ... Adith P P
09 Oct 2020
09 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust visual question answering via polarity enhancement and contrast

Abstract

Talk to us

Similar Papers

More From: Neural Networks