Abstract Tree-based methods are being both successfully applied and critically discussed in corpus linguistics. In this article, we would like to contribute a few aspects to this discussion from a methodological point of view. These aspects include the interpretation of interaction effects in single trees and random forests, as well as more general aspects like stability and overfitting. In particular, we have conducted a simulation study to investigate an approach suggested by Gries for computing the importance of interactions in random forests more systematically than the previous literature. The evidence of this simulation study shows that, even when interaction predictors are explicitly added, the permutation variable importance is not suited for distinguishing between main effects and interaction effects or between interaction effects of different orders. We also discuss the use of partial dependence (PD) and individual conditional expectation (ICE) plots for illustrating the functional form and potential interaction effects, and other means of interpretable machine learning.
Read full abstract