Abstract

Prediction methods can be augmented by local explanation methods (LEMs) to perform root cause analysis for individual observations. But while most recent research on LEMs focus on low-dimensional problems, real-world datasets commonly have hundreds or thousands of variables. Here, we investigate how LEMs perform for high-dimensional industrial applications. Seven prediction methods (penalized logistic regression, LASSO, gradient boosting, random forest and support vector machines) and three LEMs (TreeExplainer, Kernel SHAP, and the conditional normal sampling importance (CNSI)) were combined into twelve explanation approaches. These approaches were used to compute explanations for simulated data, and real-world industrial data with simulated responses. The approaches were ranked by how well they predicted the contributions according to the true models. For the simulation experiment, the generalized linear methods provided best explanations, while gradient boosting with either TreeExplainer or CNSI, or random forest with CNSI were robust for all relationships. For the real-world experiment, TreeExplainer performed similarly, while the explanations from CNSI were significantly worse. The generalized linear models were fastest, followed by TreeExplainer, while CNSI and Kernel SHAP required several orders of magnitude more computation time. In conclusion, local explanations can be computed for high-dimensional data, but the choice of statistical tools is crucial.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.