On the (Complete) Reasons Behind Decisions

Adnan Darwiche,Auguste Hirth

doi:10.1007/s10849-022-09377-8

Adnan Darwiche, Auguste Hirth

Open Access

https://doi.org/10.1007/s10849-022-09377-8

Copy DOI

Abstract

Recent work has shown that the input-output behavior of some common machine learning classifiers can be captured in symbolic form, allowing one to reason about the behavior of these classifiers using symbolic techniques. This includes explaining decisions, measuring robustness, and proving formal properties of machine learning classifiers by reasoning about the corresponding symbolic classifiers. In this work, we present a theory for unveiling the reasons behind the decisions made by Boolean classifiers and study some of its theoretical and practical implications. At the core of our theory is the notion of a complete reason, which can be viewed as a necessary and sufficient condition for why a decision was made. We show how the complete reason can be used for computing notions such as sufficient reasons (also known as PI-explanations and abductive explanations), how it can be used for determining decision and classifier bias and how it can be used for evaluating counterfactual statements such as “a decision will stick even if ...because ... .” We present a linear-time algorithm for computing the complete reasoning behind a decision, assuming the classifier is represented by a Boolean circuit of appropriate form. We then show how the computed complete reason can be used to answer many queries about a decision in linear or polynomial time. We finally conclude with a case study that illustrates the various notions and techniques we introduced.

Full Text