Abstract

Despite most state-of-the-art models having achieved amazing performance in Visual Question Answering (VQA) , they usually utilize biases to answer the question. Recently, some studies synthesize counterfactual training samples to help the model to mitigate the biases. However, these synthetic samples need extra annotations and often contain noises. Moreover, these methods simply add synthetic samples to the training data to train the model with the cross-entropy loss, which cannot make the best use of synthetic samples to mitigate the biases. In this article, to mitigate the biases in VQA more effectively, we propose a Hierarchical Counterfactual Contrastive Learning (HCCL) method. Firstly, to avoid introducing noises and extra annotations, our method automatically masks the unimportant features in original pairs to obtain positive samples and create mismatched question-image pairs as negative samples. Then our method uses feature-level and answer-level contrastive learning to make the original sample close to positive samples in the feature space, while away from negative samples in both feature and answer spaces. In this way, the VQA model can learn the robust multimodal features and focus on both visual and language information to produce the answer. Our HCCL method can be adopted in different baselines, and the experimental results on VQA v2, VQA-CP, and GQA-OOD datasets show that our method is effective in mitigating the biases in VQA, which improves the robustness of the VQA model.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.