A comparison of resampling and recursive partitioning methods in random forest for estimating the asymptotic variance using the infinitesimal jackknife

Cole Brokamp,Roman Jandarov,Patrick Ryan,Mb Rao

doi:10.1002/sta4.162

Abstract

The infinitesimal jackknife (IJ) has recently been applied to the random forest to estimate its prediction variance. These theorems were verified under a traditional random forest framework that uses classification and regression trees and bootstrap resampling. However, random forests using conditional inference trees and subsampling have been found to be not prone to variable selection bias. Here, we conduct simulation experiments using a novel approach to explore the applicability of the IJ to random forests using variations on the resampling method and base learner. Test data points were simulated and each trained using random forest on one hundred simulated training data sets using different combinations of resampling and base learners. Using conditional inference trees instead of traditional classification and regression trees as well as using subsampling instead of bootstrap sampling resulted in a much more accurate estimation of prediction variance when using the IJ. The random forest variations here have been incorporated into an open‐source software package for the R programming language. Copyright © 2017 John Wiley & Sons, Ltd.

Full Text