Abstract

Shapley value regression with machine learning models has recently emerged as an axiomatic approach to the development of diagnostic models. However, when large numbers of predictor variables have to be considered, these methods become infeasible, owing to the inhibitive computational cost. In this paper, an approximate Shapley value approach with random forests is compared with a full Shapley model, as well as other methods used in variable importance analysis. Three case studies are considered, namely one based on simulated data, a model predicting throughput in a calcium carbide furnace as a function of operating variables, and a case study related to energy consumption in a steel plant. The approximately Shapley approach achieved results very similar to those achieved with the full Shapley approach but at a fraction of the computational cost. Moreover, although the variable importance measures considered in this study consistently identified the most influential predictors in the case studies, they yielded different results when fewer influential predictors were considered, and none of the variable importance measures performed better than the other measures across all three case studies.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call