Do little interactions get lost in dark random forests?

Marvin N Wright,Andreas Ziegler,Inke R König

doi:10.1186/s12859-016-0995-8

Abstract

BackgroundRandom forests have often been claimed to uncover interaction effects. However, if and how interaction effects can be differentiated from marginal effects remains unclear. In extensive simulation studies, we investigate whether random forest variable importance measures capture or detect gene-gene interactions. With capturing interactions, we define the ability to identify a variable that acts through an interaction with another one, while detection is the ability to identify an interaction effect as such.ResultsOf the single importance measures, the Gini importance captured interaction effects in most of the simulated scenarios, however, they were masked by marginal effects in other variables. With the permutation importance, the proportion of captured interactions was lower in all cases. Pairwise importance measures performed about equal, with a slight advantage for the joint variable importance method. However, the overall fraction of detected interactions was low. In almost all scenarios the detection fraction in a model with only marginal effects was larger than in a model with an interaction effect only.ConclusionsRandom forests are generally capable of capturing gene-gene interactions, but current variable importance measures are unable to detect them as interactions. In most of the cases, interactions are masked by marginal effects and interactions cannot be differentiated from marginal effects. Consequently, caution is warranted when claiming that random forests uncover interactions.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-0995-8) contains supplementary material, which is available to authorized users.

Highlights

Random forests have often been claimed to uncover interaction effects
When the minor allele frequencies (MAF) of the interacting single nucleotide polymorphisms (SNPs) was increased (Fig. 3b), the capture fraction was higher for both importance measures and all interaction models, except for permutation importance and the Redundant model, where the interacting SNPs were almost never ranked in the top 10 SNPs
We conclude that random forests are generally capable of capturing SNP-SNP interactions, but current variable importance measures are unable to detect them

Summary

Introduction

Random forests have often been claimed to uncover interaction effects. if and how interaction effects can be differentiated from marginal effects remains unclear. Random forests have often been claimed to uncover interaction effects [1,2,3,4,5,6,7,8]. This is deduced from the recursive structure of trees, which generally enables them to take dependencies into account in a hierarchical manner.

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 31, 2016
Citations: 145	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Do little interactions get lost in dark random forests?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

SNP interaction detection with Random Forests in high-dimensional genetic data
Stacey J Winham ... Xin Wang
BMC Bioinformatics | VOL. 13
Stacey J Winham, et. al.Stacey J Winham ... Xin Wang
15 Jul 2012
BMC Bioinformatics | VOL. 13

Bias in random forest variable importance measures: illustrations, sources and a solution.
Carolin Strobl ... Achim Zeileis
BMC Bioinformatics | VOL. 8
Carolin Strobl, et. al.Carolin Strobl ... Achim Zeileis
25 Jan 2007
BMC Bioinformatics | VOL. 8

Empirical characterization of random forest variable importance measures
Kellie J Archer ... Ryan V Kimes
Computational Statistics & Data Analysis | VOL. 52
Kellie J Archer, et. al.Kellie J Archer ... Ryan V Kimes
30 Aug 2007
Computational Statistics & Data Analysis | VOL. 52

Mining data with random forests: A survey and results of new tests
A Verikas ... M Bacauskiene
Pattern Recognition | VOL. 44
A Verikas, et. al.A Verikas ... M Bacauskiene
12 Aug 2010
Pattern Recognition | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Do little interactions get lost in dark random forests?

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics