Abstract

Linear regression is often used as a diagnostic tool to understand the relative contributions of operational variables to some key performance indicator or response variable. However, owing to the nature of plant operations, predictor variables tend to be correlated, often highly so, and this can lead to significant complications in assessing the importance of these variables. Shapley regression is seen as the only axiomatic approach to deal with this problem but has almost exclusively been used with linear models to date. In this paper, the approach is extended to random forests, and the results are compared with some of the empirical variable importance measures widely used with these models, i.e., permutation and Gini variable importance measures. Four case studies are considered, of which two are based on simulated data and two on real world data from the mineral process industries. These case studies suggest that the random forest Shapley variable importance measure may be a more reliable indicator of the influence of predictor variables than the other measures that were considered. Moreover, the results obtained with the Gini variable importance measure was as reliable or better than that obtained with the permutation measure of the random forest.

Highlights

  • Insight into the underlying physical phenomena in process systems is key to the development of reliable process models

  • Linear regression is well established as a diagnostic tool to understand the relative contributions or the statistical significance of operational variables to some key performance indicator or response variable on process plants

  • The Shapley variable importance measure is derived from Equation (1), where the R2 -values were generated by both linear regression and random forest regression models in this study

Read more

Summary

Introduction

Insight into the underlying physical phenomena in process systems is key to the development of reliable process models. Random forests are a popular approach to develop reliable models for process systems They are robust, i.e., they contain few hyperparameters to be specified by the user and can be used to quantitatively assess the contributions of predictor variables to the response. These models have been used in variable importance analysis in a wide range of technical disciplines, including the mineral processing industries Examples of these applications include the use of such models in comminution [7,8,9,10], froth flotation systems [11,12,13,14,15,16], sensor-based ore sorting [17], and blast fragmentation from open pit mines [18].

Methodology
Axiomatic Properties of Shapley Values
Random Forests
Decision Trees
Ensembles of Decision Trees
Splitting Criteria
Variable Importance Measures
Shapley Variable Importance with Random Forests and Linear Regression Models
Permutation Variable Importance
Gini Variable Importance
Case Studies
Nonlinear System with Strongly Correlated Predictors
Effect of the random forest hyperparameter mtry model performance in Case
Consumption of Leaching
13. Variable
Findings
Discussion and Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.