286Collider-stratification bias when estimating variable importance using Random Forests

Stephanie Long,Tibor Schuster,Genevieve Lefebvre

doi:10.1093/ije/dyab168.399

Stephanie Long, Tibor Schuster + Show 1 more

Open Access

https://doi.org/10.1093/ije/dyab168.399

Copy DOI

Abstract

Abstract Background Advances in causal inference have helped explain the longstanding birthweight and obesity paradoxes: selection bias due to conditioning on a collider variable i.e. collider-stratification bias (CSB). The lessons learned have critical implications for the interpretation of machine learning (ML), including decision trees and random forests (RFs), that implicitly condition on input variables. RFs are a popular approach for identifying important “predictors” from large data through variable importance, defined by the average decrease in prediction accuracy. While CSB has become a recognized concern when estimating exposure-outcome effects, knowledge of its impact on ML’s variable importance measures (VIMs) is limited. Applying the causal inference framework, we investigated the accuracy of RFs’ VIMs in data-mechanisms prone to CSB. Methods A Monte Carlo simulation study was conducted, with binary outcome and collider variables generated from logistic models. Two exposure variables stochastically determined the outcome and a collider variable, independent of the outcome. VIMs from RFs were compared to the known causal relevance of the input variables on the outcome. Results While variable importance of true exposure variables was not systematically affected by CSB, validity of VIMs can be affected, leading to erroneous selection of collider variables, causally independent of the outcome, as outcome predictors. Conclusions In presence of CSB, VIMs are not valid measures of the causal relevance of variables and may mislead selection of truly important factors that affect the outcome. Key messages ML must consider causal data-generating mechanisms otherwise it may lead to erroneous assessment of variable importance regarding outcome prediction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

286Collider-stratification bias when estimating variable importance using Random Forests

Abstract

Talk to us

Similar Papers

More From: International Journal of Epidemiology

Lead the way for us

Journal: International Journal of Epidemiology	Publication Date: Sep 1, 2021
Citations: 1

Similar Papers

Collaborative targeted maximum likelihood estimation for variable importance measure: Illustration for functional outcome prediction in mild traumatic brain injuries.
Romain Pirracchio ... Alan E Hubbard
Statistical Methods in Medical Research | VOL. 27
Romain Pirracchio, et. al.Romain Pirracchio ... Alan E Hubbard
29 Jun 2016
Statistical Methods in Medical Research | VOL. 27

The behaviour of random forest permutation-based variable importance measures under predictor correlation
Kristin K Nicodemus ... Carolin Strobl
BMC Bioinformatics | VOL. 11
Kristin K Nicodemus, et. al.Kristin K Nicodemus ... Carolin Strobl
27 Feb 2010
BMC Bioinformatics | VOL. 11

Editor's evaluation: Derivation and external validation of clinical prediction rules identifying children at risk of linear growth faltering
Eduardo Franco
-
Eduardo FrancoEduardo Franco
05 Sep 2022
05 Sep 2022

Decision letter: Derivation and external validation of clinical prediction rules identifying children at risk of linear growth faltering
Andrew N Mertens ... Eduardo Franco
-
Andrew N Mertens, et. al.Andrew N Mertens ... Eduardo Franco
05 Sep 2022
05 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

286Collider-stratification bias when estimating variable importance using Random Forests

Abstract

Talk to us

Similar Papers

More From: International Journal of Epidemiology