Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA

Chengen Lai,Sitong Yan,Jingyang Li,Shengli Song,Shiqi Meng,Guangneng Hu

doi:10.1609/aaai.v38i3.28065

Abstract

Natural language explanation in visual question answer (VQA-NLE) aims to explain the decision-making process of models by generating natural language sentences to increase users' trust in the black-box systems. Existing post-hoc methods have achieved significant progress in obtaining a plausible explanation. However, such post-hoc explanations are not always aligned with human logical inference, suffering from the issues on: 1) Deductive unsatisfiability, the generated explanations do not logically lead to the answer; 2) Factual inconsistency, the model falsifies its counterfactual explanation for answers without considering the facts in images; and 3) Semantic perturbation insensitivity, the model can not recognize the semantic changes caused by small perturbations. These problems reduce the faithfulness of explanations generated by models. To address the above issues, we propose a novel self-supervised Multi-level Contrastive Learning based natural language Explanation model (MCLE) for VQA with semantic-level, image-level, and instance-level factual and counterfactual samples. MCLE extracts discriminative features and aligns the feature spaces from explanations with visual question and answer to generate more consistent explanations. We conduct extensive experiments, ablation analysis, and case study to demonstrate the effectiveness of our method on two VQA-NLE benchmarks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Mar 24, 2024
Citations: 1

Similar Papers

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Leonard Salewski ... Zeynep Akata
-
Leonard Salewski, et. al.Leonard Salewski ... Zeynep Akata
01 Jan 2021
01 Jan 2021

Visual Question Answering as Reading Comprehension
Hui Li ... Anton Van Den Hengel
-
Hui Li, et. al.Hui Li ... Anton Van Den Hengel
01 Jun 2019
01 Jun 2019

Improving Visual Question Answering by Referring to Generated Paragraph Captions
Hyounghun Kim ... Mohit Bansal
-
Hyounghun Kim, et. al.Hyounghun Kim ... Mohit Bansal
01 Jan 2019
01 Jan 2019

Estimating Viewed Images with Natural Language Question Answering from fMRI Data
Saya Takada ... Ren Togo
-
Saya Takada, et. al.Saya Takada ... Ren Togo
01 Mar 2020
01 Mar 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence