Conditional variable importance for random forests

Carolin Strobl,Thomas Augustin,Achim Zeileis,Anne-Laure Boulesteix,Thomas Kneib

doi:10.1186/1471-2105-9-307

Abstract

BackgroundRandom forests are becoming increasingly popular in many scientific fields because they can cope with "small n large p" problems, complex interactions and even highly correlated predictor variables. Their variable importance measures have recently been suggested as screening tools for, e.g., gene expression studies. However, these variable importance measures show a bias towards correlated predictor variables.ResultsWe identify two mechanisms responsible for this finding: (i) A preference for the selection of correlated predictors in the tree building process and (ii) an additional advantage for correlated predictor variables induced by the unconditional permutation scheme that is employed in the computation of the variable importance measure. Based on these considerations we develop a new, conditional permutation scheme for the computation of the variable importance measure.ConclusionThe resulting conditional variable importance reflects the true impact of each predictor variable more reliably than the original marginal approach.

Highlights

Random forests are becoming increasingly popular in many scientific fields because they can cope with "small n large p" problems, complex interactions and even highly correlated predictor variables
The variable importance measures yielded by random forests have been suggested for the selection of relevant predictor variables in the analysis of microarray data, DNA sequencing and other applications [2,3,4,5]
In this case a key advantage of random forest variable importance measures, as compared to univariate screening methods, is that they cover the impact of each predictor variable individu

Summary

Introduction

Random forests are becoming increasingly popular in many scientific fields because they can cope with "small n large p" problems, complex interactions and even highly correlated predictor variables. Their variable importance measures have recently been suggested as screening tools for, e.g., gene expression studies. Within the past few years, random forests [1] have become a popular and widely-used tool for non-parametric regression in many scientific areas They show high predictive accuracy and are applicable even in highdimensional problems with highly correlated variables, a situation which often occurs in bioinformatics. In this case a key advantage of random forest variable importance measures, as compared to univariate screening methods, is that they cover the impact of each predictor variable individu-

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jul 11, 2008
Citations: 2448	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Conditional variable importance for random forests

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Intervention in prediction measure: a new approach to assessing variable importance for random forests
Irene Epifanio
BMC Bioinformatics | VOL. 18
Irene EpifanioIrene Epifanio
02 May 2017
BMC Bioinformatics | VOL. 18

286Collider-stratification bias when estimating variable importance using Random Forests
Stephanie Long ... Tibor Schuster
International Journal of Epidemiology | VOL. 50
Stephanie Long, et. al.Stephanie Long ... Tibor Schuster
01 Sep 2021
International Journal of Epidemiology | VOL. 50

Bias in random forest variable importance measures: illustrations, sources and a solution.
Carolin Strobl ... Achim Zeileis
BMC Bioinformatics | VOL. 8
Carolin Strobl, et. al.Carolin Strobl ... Achim Zeileis
25 Jan 2007
BMC Bioinformatics | VOL. 8

On what to permute in test-based approaches for variable importance measures in Random Forests
Stefano Nembrini
Bioinformatics | VOL. 35
Stefano NembriniStefano Nembrini
18 Dec 2018
Bioinformatics | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Conditional variable importance for random forests

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics