Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.

Hemant Ishwaran,Min Lu

doi:10.1002/sim.7803

Abstract

Random forests are a popular nonparametric tree ensemble procedure with broad applications to data analysis. While its widespread popularity stems from its prediction performance, an equally important feature is that it provides a fully nonparametric measure of variable importance (VIMP). A current limitation of VIMP, however, is that no systematic method exists for estimating its variance. As a solution, we propose a subsampling approach that can be used to estimate the variance of VIMP and for constructing confidence intervals. The method is general enough that it can be applied to many useful settings, including regression, classification, and survival problems. Using extensive simulations, we demonstrate the effectiveness of the subsampling estimator and in particular find that the delete-d jackknife variance estimator, a close cousin, is especially effective under low subsampling rates due to its bias correction properties. These 2 estimators are highly competitive when compared with the .164 bootstrap estimator, a modified bootstrap procedure designed to deal with ties in out-of-sample data. Most importantly, subsampling is computationally fast, thus making it especially attractive for big data settings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.

Abstract

Talk to us

Similar Papers

More From: Statistics in Medicine

Lead the way for us

Journal: Statistics in Medicine	Publication Date: Jun 4, 2018
Citations: 190

Similar Papers

Editor's evaluation: Derivation and external validation of clinical prediction rules identifying children at risk of linear growth faltering
Eduardo Franco
-
Eduardo FrancoEduardo Franco
05 Sep 2022
05 Sep 2022

Decision letter: Derivation and external validation of clinical prediction rules identifying children at risk of linear growth faltering
Andrew N Mertens ... Eduardo Franco
-
Andrew N Mertens, et. al.Andrew N Mertens ... Eduardo Franco
05 Sep 2022
05 Sep 2022

Author response: Derivation and external validation of clinical prediction rules identifying children at risk of linear growth faltering
Sharia M Ahmed ... Ben J Brintz
-
Sharia M Ahmed, et. al.Sharia M Ahmed ... Ben J Brintz
21 Dec 2022
21 Dec 2022

Clinical and laboratory predictors of 30-day mortality in severe acute malnourished children with severe pneumonia.
Lubaba Shahrin ... Zahidul Islam
Tropical Medicine & International Health | VOL. 25
Lubaba Shahrin, et. al.Lubaba Shahrin ... Zahidul Islam
28 Sep 2020
Tropical Medicine & International Health | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.

Abstract

Talk to us

Similar Papers

More From: Statistics in Medicine