How people reason with counterfactual and causal explanations for Artificial Intelligence decisions in familiar and unfamiliar domains

Lenart Celar,Ruth M J Byrne

doi:10.3758/s13421-023-01407-5

Abstract

Few empirical studies have examined how people understand counterfactual explanations for other people’s decisions, for example, “if you had asked for a lower amount, your loan application would have been approved”. Yet many current Artificial Intelligence (AI) decision support systems rely on counterfactual explanations to improve human understanding and trust. We compared counterfactual explanations to causal ones, i.e., “because you asked for a high amount, your loan application was not approved”, for an AI’s decisions in a familiar domain (alcohol and driving) and an unfamiliar one (chemical safety) in four experiments (n = 731). Participants were shown inputs to an AI system, its decisions, and an explanation for each decision; they attempted to predict the AI’s decisions, or to make their own decisions. Participants judged counterfactual explanations more helpful than causal ones, but counterfactuals did not improve the accuracy of their predictions of the AI’s decisions more than causals (Experiment 1). However, counterfactuals improved the accuracy of participants’ own decisions more than causals (Experiment 2). When the AI’s decisions were correct (Experiments 1 and 2), participants considered explanations more helpful and made more accurate judgements in the familiar domain than in the unfamiliar one; but when the AI’s decisions were incorrect, they considered explanations less helpful and made fewer accurate judgements in the familiar domain than the unfamiliar one, whether they predicted the AI’s decisions (Experiment 3a) or made their own decisions (Experiment 3b). The results corroborate the proposal that counterfactuals provide richer information than causals, because their mental representation includes more possibilities.

Full Text