MSHAP: SHAP Values for Two-Part Models

Spencer Matthews,Brian Hartman

doi:10.3390/risks10010003

Spencer Matthews, Brian Hartman

Open Access

https://doi.org/10.3390/risks10010003

Copy DOI

Journal: Risks	Publication Date: Dec 24, 2021
Citations: 8	License type: CC BY 4.0

Affiliation: University of California, Irvine, Brigham Young University

Abstract

Two-part models are important to and used throughout insurance and actuarial science. Since insurance is required for registering a car, obtaining a mortgage, and participating in certain businesses, it is especially important that the models that price insurance policies are fair and non-discriminatory. Black box models can make it very difficult to know which covariates are influencing the results, resulting in model risk and bias. SHAP (SHapley Additive exPlanations) values enable interpretation of various black box models, but little progress has been made in two-part models. In this paper, we propose mSHAP (or multiplicative SHAP), a method for computing SHAP values of two-part models using the SHAP values of the individual models. This method will allow for the predictions of two-part models to be explained at an individual observation level. After developing mSHAP, we perform an in-depth simulation study. Although the kernelSHAP algorithm is also capable of computing approximate SHAP values for a two-part model, a comparison with our method demonstrates that mSHAP is exponentially faster. Ultimately, we apply mSHAP to a two-part ratemaking model for personal auto property damage insurance coverage. Additionally, an R package (mshap) is available to easily implement the method in a wide variety of applications.

Highlights

One of the most popular families of machine learning models are tree-based algorithms, which use the concept of many decision trees working together to create more generalized predictions (Lundberg et al 2020)
There have been significant advancements made in this area, current methods are unable to rapidly assign input contributions to outputs in two-part models. This lack of explainability is an issue in the insurance industry, and here we propose a method of explaining two-part models that works rapidly and effectively
While we focus on the local accuracy property for the rest of this section, we note that since mSHAP is built on top of treeSHAP, it automatically incorporates consistency/monotonicity and missingness properties

Summary

Introduction

One of the most popular families of machine learning models are tree-based algorithms, which use the concept of many decision trees working together to create more generalized predictions (Lundberg et al 2020). Current implementations include random forests, gradient boosted forests, and others. These models are very good at learning relationships and have proven highly accurate in diverse areas. Many aspects of life are affected by these algorithms as they have been implemented in business, technology, and more. As these methods become more abundant, it is crucial that explanations of model output are available. There have been some advances in quantifying the uncertainty around black-box predictions as in Ablad et al (2021), we search for more interpretable explanations that relate inputs to model outputs. We will regard an explainable system as what Doran et al (2017) refer to as a comprehensible system, or one that “allow[s] the user to relate properties of the inputs to their output.”

Methods

Results

Conclusion