Abstract

The Shapley value has become popular in the Explainable AI (XAI) literature, thanks, to a large extent, to a solid theoretical foundation, including four “favourable and fair” axioms for attribution in transferable utility games. The Shapley value is probably the only solution concept satisfying these axioms. In this paper, we introduce the Shapley value and draw attention to its recent uses as a feature selection tool. We call into question this use of the Shapley value, using simple, abstract “toy” counterexamples to illustrate that the axioms may work against the goals of feature selection. From this, we develop a number of insights that are then investigated in concrete simulation settings, with a variety of Shapley value formulations, including SHapley Additive exPlanations (SHAP) and Shapley Additive Global importancE (SAGE). The aim is not to encourage any use of the Shapley value for feature selection, but we aim to clarify various limitations around their current use in the literature. In so doing, we hope to help demystify certain aspects of the Shapley value axioms that are viewed as “favourable”. In particular, we wish to highlight that the favourability of the axioms depends non-trivially on the way in which the Shapley value is appropriated in the XAI application.

Highlights

  • T HE problem of feature selection in Machine Learning (ML) constitutes selecting some subset S of a set F of |F | = d feature indices, such that the submodel formed from the features indexed by S will maximise some evaluation function C(S) of the submodel, while minimising a cost, which is increasing in |S|

  • The ML methods that stand out in terms of popularity are SHapley Additive exPlanations (SHAP) [15], [17], Shapley Effects [6] and Shapley Additive Global importancE (SAGE) [24], though the Shapley value itself carries a rich history of investigation in the context of game theory – Lloyd Shapley’s 1953 seminal paper [27] has over 9000 citations, and the concept has attracted the attention of various Nobel prize winning economists [28]– [34]

  • The axioms do not in general provide any guarantee that the Shapley value is suited to feature selection, and may, in some cases, imply the opposite

Read more

Summary

INTRODUCTION

A similar (and more general) problem – model selection – has deep roots in computational statistics [1], where attention is paid to inferential nuances like quantification of uncertainty, significance testing, confounding predictors, collinearity, and the design of experiments It was in this literature that the Shapley value was first applied to linear regression models, with its own history of discourse (see [2]– [7] and the more critical [8], which traces development to [9]–[11], with reinventions by [12] and [13]). Our goal is to draw scrutiny towards the Shapley value axioms, and attention towards the generality of the game theoretic formulation We do this in a specific sub-context of feature selection (characterised by Algorithm 1), which we take to be an archetype of the “naïve” application of Shapley values to feature selection. “Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question. . . ” – John Tukey [50]

THE SHAPLEY VALUE
THE MEANING OF MODEL AVERAGING
EXPERIMENTATION
MARKOV BOUNDARY EXPERIMENT 1
A SECRET HOLDER EXPERIMENT In this experiment we consider the DGP
A TAXICAB EXPERIMENT
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call