Abstract

In the steel industry, defects may occur during the manufacturing process. Thus, it is important to predict the occurrence of defects online in steel products and identify the causal variable that may lead to defects. However, the unique characteristics of the observed defect count data, such as nonnegative integers and high overdispersion, have posed some difficulties to the traditional probability models. To deal with this issue, the present work employs random forests to model and analyze the observed defect count data. Random forests are a nonlinear ensemble learning technique, which constructs several regression trees during the training phase and then predicts the output by averaging the predictions of each tree. Unlike the traditional probability models which are based on the specific distribution assumption, random forests are a non-parametric or distribution-free model. Furthermore, random forests can ensure the nonnegativity of the prediction, and thus it is suitable for defect count data modeling. In addition, partial dependence analysis in conjunction with the variable importance measure was used to identify the causal variable. The application results on the real steelmaking process have demonstrated that random forests outperform the PLS, SVR, Poisson, and NB methods in prediction accuracy. And the most influential variables identified by random forests are in line with operator experience.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call