Abstract
The Visual Question Answering (VQA) task is an important research direction in the field of artificial intelligence, which requires a model that can simultaneously understand visual images and natural language questions, and answer questions related to images. Recent studies have shown that many Visual Question Answering models rely on statistically regular correlations between questions and answers, which in turn weakens the correlation between visual content and textual information. In this work, we propose an unbiased Visual Question Answering method to solve language priors from the perspective of strengthening the contrast between the correct answer and the positive and negative predictions. We design a new model consisting of two modules with different roles. We input the image and the question corresponding to it into the Answer Visual Attention Modules to generate positive prediction output, and then use a Dual Channels Joint Module to generate negative prediction output with great linguistic prior knowledge. Finally, we input the positive and negative predictions together with the correct answer to our newly designed loss function for training. Our method achieves high performance (61.24%) on the VQA-CP v2 dataset. In addition, most existing debiasing methods improve performance on VQA-CP v2 dataset at the cost of reducing performance on VQA v2 dataset, while our method not only does not reduce the accuracy on VQA v2 dataset. Instead, it improves performance on both datasets mentioned above.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.