Shapley Values for Feature Selection: The Good, the Bad, and the Axioms

Daniel Fryer,Inga Strumke,Hien Nguyen

doi:10.1109/access.2021.3119110

Daniel Fryer, Inga Strumke + Show 1 more

Open Access

https://doi.org/10.1109/access.2021.3119110

Copy DOI

Abstract

The Shapley value has become popular in the Explainable AI (XAI) literature, thanks, to a large extent, to a solid theoretical foundation, including four “favourable and fair” axioms for attribution in transferable utility games. The Shapley value is probably the only solution concept satisfying these axioms. In this paper, we introduce the Shapley value and draw attention to its recent uses as a feature selection tool. We call into question this use of the Shapley value, using simple, abstract “toy” counterexamples to illustrate that the axioms may work against the goals of feature selection. From this, we develop a number of insights that are then investigated in concrete simulation settings, with a variety of Shapley value formulations, including SHapley Additive exPlanations (SHAP) and Shapley Additive Global importancE (SAGE). The aim is not to encourage any use of the Shapley value for feature selection, but we aim to clarify various limitations around their current use in the literature. In so doing, we hope to help demystify certain aspects of the Shapley value axioms that are viewed as “favourable”. In particular, we wish to highlight that the favourability of the axioms depends non-trivially on the way in which the Shapley value is appropriated in the XAI application.

Highlights

T HE problem of feature selection in Machine Learning (ML) constitutes selecting some subset S of a set F of |F | = d feature indices, such that the submodel formed from the features indexed by S will maximise some evaluation function C(S) of the submodel, while minimising a cost, which is increasing in |S|
The ML methods that stand out in terms of popularity are SHapley Additive exPlanations (SHAP) [15], [17], Shapley Effects [6] and Shapley Additive Global importancE (SAGE) [24], though the Shapley value itself carries a rich history of investigation in the context of game theory – Lloyd Shapley’s 1953 seminal paper [27] has over 9000 citations, and the concept has attracted the attention of various Nobel prize winning economists [28]– [34]
The axioms do not in general provide any guarantee that the Shapley value is suited to feature selection, and may, in some cases, imply the opposite

Summary

INTRODUCTION

A similar (and more general) problem – model selection – has deep roots in computational statistics [1], where attention is paid to inferential nuances like quantification of uncertainty, significance testing, confounding predictors, collinearity, and the design of experiments It was in this literature that the Shapley value was first applied to linear regression models, with its own history of discourse (see [2]– [7] and the more critical [8], which traces development to [9]–[11], with reinventions by [12] and [13]). Our goal is to draw scrutiny towards the Shapley value axioms, and attention towards the generality of the game theoretic formulation We do this in a specific sub-context of feature selection (characterised by Algorithm 1), which we take to be an archetype of the “naïve” application of Shapley values to feature selection. “Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question. . . ” – John Tukey [50]

THE SHAPLEY VALUE

THE MEANING OF MODEL AVERAGING

EXPERIMENTATION

MARKOV BOUNDARY EXPERIMENT 1

A SECRET HOLDER EXPERIMENT In this experiment we consider the DGP

A TAXICAB EXPERIMENT

DISCUSSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2021
Citations: 96	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Shapley Values for Feature Selection: The Good, the Bad, and the Axioms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Explaining multivariate molecular diagnostic tests via Shapley values
Joanna Roder ... Laura Maguire
BMC medical informatics and decision making | VOL. 21
Joanna Roder, et. al.Joanna Roder ... Laura Maguire
08 Jul 2021
BMC medical informatics and decision making | VOL. 21

Axiomatizations of Harsanyi Solutions and Extensions, Values for Level Structures, and Polynomial-Time Algorithms

-

10 Jan 2021
10 Jan 2021

Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel
Andrea Mastropietro ... Jürgen Bajorath
Scientific Reports | VOL. 13
Andrea Mastropietro, et. al.Andrea Mastropietro ... Jürgen Bajorath
10 Nov 2023
Scientific Reports | VOL. 13

Data analysis with Shapley values for automatic subject selection in Alzheimer\u2019s disease data sets using interpretable machine learning
Louise Bloch ... Christoph M Friedrich
Alzheimer's research & therapy | VOL. 13
Louise Bloch, et. al.Louise Bloch ... Christoph M Friedrich
15 Sep 2021
Alzheimer's research & therapy | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Shapley Values for Feature Selection: The Good, the Bad, and the Axioms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions