Tractability of explaining classifier decisions

Martin C Cooper,João Marques-Silva

doi:10.1016/j.artint.2022.103841

Abstract

Explaining decisions is at the heart of explainable AI. We investigate the computational complexity of providing a formally-correct and minimal explanation of a decision taken by a classifier. In the case of threshold (i.e. score-based) classifiers, we show that a complexity dichotomy follows from the complexity dichotomy for languages of cost functions. In particular, submodular classifiers allow tractable explanation of positive decisions, but not negative decisions (assuming P≠NP). This is an example of the possible asymmetry between the complexity of explaining positive and negative decisions of a particular classifier. Nevertheless, there are large families of classifiers for which explaining both positive and negative decisions is tractable, such as monotone or modular (e.g. linear) classifiers. We extend the characterisation of tractable cases to constrained classifiers (when there are constraints on the possible input vectors) and to the search for contrastive rather than abductive explanations. Indeed, we show that tractable classes coincide for abductive and contrastive explanations in the constrained or unconstrained settings. We show the intractability of returning a set of k diverse explanations even for linear classifiers and k=2. Finding a minimum-cardinality explanation is tractable for the family of modular classifiers, i.e. when the score function is the sum of unary functions, but becomes intractable when any non-modular function is also allowed.

Full Text