Abstract

Feature attributions based on the Shapley value are popular for explaining machine learning models. However, their estimation is complex from both theoretical and computational standpoints. We disentangle this complexity into two main factors: the approach to removing feature information and the tractable estimation strategy. These two factors provide a natural lens through which we can better understand and compare 24 distinct algorithms. Based on the various feature-removal approaches, we describe the multiple types of Shapley value feature attributions and the methods to calculate each one. Then, based on the tractable estimation strategies, we characterize two distinct families of approaches: model-agnostic and model-specific approximations. For the model-agnostic approximations, we benchmark a wide class of estimation approaches and tie them to alternative yet equivalent characterizations of the Shapley value. For the model-specific approximations, we clarify the assumptions crucial to each method’s tractability for linear, tree and deep models. Finally, we identify gaps in the literature and promising future research directions. There are numerous algorithms for generating Shapley value explanations. The authors provide a comprehensive survey of Shapley value feature attribution algorithms by disentangling and clarifying the fundamental challenges underlying their computation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call