Forward stepwise random forest analysis for experimental designs

Chang-Yun Lin

doi:10.1080/00224065.2020.1865853

Abstract

In experimental designs, it is usually assumed that the data follow normal distributions and the models have linear structures. In practice, experimenters may encounter different types of responses and be uncertain about model structures. If this is the case, traditional methods, such as the ANOVA and regression, are not suitable for data analysis and model selection. We introduce the random forest analysis, which is a powerful machine learning method capable of analyzing numerical and categorical data with complicated model structures. To perform model selection and factor identification with the random forest method, we propose a forward stepwise algorithm and develop Python and R codes based on minimizing the OOB error. Six examples including simulation and case studies are provided. We compare the performance of the proposed method and some frequently used analysis methods. Results show that the forward stepwise random forest analysis, in general, has a high power for identifying active factors and selects models that have high prediction accuracy.

Full Text