Abstract

Most existing visual question answering (VQA) models strongly rely on language bias to answer questions, i.e., they always tend to fit question-answer pairs on the train split and perform poorly on the test spilt when the answer distributions are different. This behavior makes them hard to be applied in real scenarios. To reduce the language biases, previous studies mainly integrate modules to overcome language priors (ensemble-based methods) or generate additional training data to balance dataset biases (data-balanced methods). However, all the existing ensemble-based methods drop their accuracies on the VQA v2 dataset, while data-balanced methods may introduce new biases and cannot guarantee the quality of the generated data. In this paper, we propose a model-agnostic training scheme called Suppressing Biased Samples (SBS) to overcome language priors. SBS consists of two collaborative parts, i.e., a Data Classifier Module to divide the dataset into biased samples and unbiased samples by utilizing the similarity in the semantic space, and a Bias Penalty Module to suppress the biased samples to weaken their influence. As a new way of balancing data to address language bias, SBS overcomes the shortcomings of previous data-balanced methods. Experimental results show that our method can be merged into other bias-reduction methods and achieves a new state-of-the-art performance on the commonly used VQA-CP v2 dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call