Provable Boolean interaction recovery from tree ensemble obtained via random forests

Merle Behr,Yu Wang,Xiao Li,Bin Yu

doi:10.1073/pnas.2118636119

Merle Behr, Yu Wang + Show 2 more

Open Access

https://doi.org/10.1073/pnas.2118636119

Copy DOI

Abstract

Random Forests (RFs) are at the cutting edge of supervised machine learning in terms of prediction performance, especially in genomics. Iterative RFs (iRFs) use a tree ensemble from iteratively modified RFs to obtain predictive and stable nonlinear or Boolean interactions of features. They have shown great promise for Boolean biological interaction discovery that is central to advancing functional genomics and precision medicine. However, theoretical studies into how tree-based methods discover Boolean feature interactions are missing. Inspired by the thresholding behavior in many biological processes, we first introduce a discontinuous nonlinear regression model, called the “Locally Spiky Sparse” (LSS) model. Specifically, the LSS model assumes that the regression function is a linear combination of piecewise constant Boolean interaction terms. Given an RF tree ensemble, we define a quantity called “Depth-Weighted Prevalence” (DWP) for a set of signed features S±. Intuitively speaking, DWP(S±) measures how frequently features in S± appear together in an RF tree ensemble. We prove that, with high probability, DWP(S±) attains a universal upper bound that does not involve any model coefficients, if and only if S± corresponds to a union of Boolean interactions under the LSS model. Consequentially, we show that a theoretically tractable version of the iRF procedure, called LSSFind, yields consistent interaction discovery under the LSS model as the sample size goes to infinity. Finally, simulation results show that LSSFind recovers the interactions under the LSS model, even when some assumptions are violated.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the National Academy of Sciences	Publication Date: May 24, 2022
Citations: 6	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Provable Boolean interaction recovery from tree ensemble obtained via random forests

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences

Lead the way for us

Similar Papers

Learning epistatic polygenic phenotypes with Boolean interactions.
Aldo Cordova-Palomera ... Rima Arnaout
PLOS ONE | VOL. 19
Aldo Cordova-Palomera, et. al.Aldo Cordova-Palomera ... Rima Arnaout
16 Apr 2024
PLOS ONE | VOL. 19

Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data
Angelica M Walker ... David Kainer
Computational and Structural Biotechnology Journal | VOL. 20
Angelica M Walker, et. al.Angelica M Walker ... David Kainer
01 Jan 2021
Computational and Structural Biotechnology Journal | VOL. 20

A Kolmogorov-Smirnov test for the molecular clock based on Bayesian ensembles of phylogenies.
Fernando Antoneli ... Luciano R Lopes
PLOS ONE | VOL. 13
Fernando Antoneli, et. al.Fernando Antoneli ... Luciano R Lopes
04 Jan 2018
PLOS ONE | VOL. 13

Iterative random forests to discover predictive and stable high-order interactions
Sumanta Basu ... Bin Yu
Proceedings of the National Academy of Sciences | VOL. 115
Sumanta Basu, et. al.Sumanta Basu ... Bin Yu
19 Jan 2018
Proceedings of the National Academy of Sciences | VOL. 115

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Provable Boolean interaction recovery from tree ensemble obtained via random forests

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences