Counterfactual explanations for misclassified images: How human and machine explanations differ

Eoin Delaney,Arjun Pakrashi,Derek Greene,Mark T Keane

doi:10.1016/j.artint.2023.103995

Eoin Delaney, Arjun Pakrashi + Show 2 more

Open Access

https://doi.org/10.1016/j.artint.2023.103995

Copy DOI

Journal: Artificial Intelligence	Publication Date: Aug 25, 2023
Citations: 3	License type: cc-by

Affiliation: University College Dublin

Abstract

Counterfactual explanations have emerged as a popular solution for the eXplainable AI (XAI) problem of elucidating the predictions of black-box deep-learning systems because people easily understand them, they apply across different problem domains and seem to be legally compliant. Although over 100 counterfactual methods exist in the XAI literature, each claiming to generate plausible explanations akin to those preferred by people, few of these methods have actually been tested on users (∼7%). Even fewer studies adopt a user-centered perspective; for instance, asking people for their counterfactual explanations to determine their perspective on a “good explanation”. This gap in the literature is addressed here using a novel methodology that (i) gathers human-generated counterfactual explanations for misclassified images, in two user studies and, then, (ii) compares these human-generated explanations to computationally-generated explanations for the same misclassifications. Results indicate that humans do not “minimally edit” images when generating counterfactual explanations. Instead, they make larger, “meaningful” edits that better approximate prototypes in the counterfactual class. An analysis based on “explanation goals” is proposed to account for this divergence between human and machine explanations. The implications of these proposals for future work are discussed.

Full Text