Risk estimation using probability machines

Abhijit Dasgupta,Jason H Moore,Joan E Bailey-Wilson,James D Malley,Silke Szymczak

doi:10.1186/1756-0381-7-2

Abstract

BackgroundLogistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios.ResultsWe show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented.ConclusionsThe models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a “risk machine”, will share properties from the statistical machine that it is derived from.

Highlights

Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features
We recently introduced the concept of a probability machine (PM) [1], which is any consistent nonparametric regression machine applied to binary or categorical outcomes
We focus on random forest regression used with a {0,1} outcome and call it a random forest probability machine (RFPM)

Summary

Introduction

Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features, both categorical and continuous. It is widely used both as an association model and a predictive model, to look at (a) the conditional probability of outcome, given predictors, and (b) predictor effect size estimates using conditional odds ratios. It is widely available in software and is easy to optimize in low-dimensional problems. The challenge is to get all the main effects and interactions (2-way and higher order) correctly specified in the model; otherwise efficient and consistent estimation is not certain

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BioData Mining	Publication Date: Mar 1, 2014
Citations: 23	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Risk estimation using probability machines

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioData Mining

Lead the way for us

Similar Papers

Accuracy in parameter estimation in cluster randomized designs.
Sunthud Pornprasertmanit ... W Joel Schneider
Psychological methods | VOL. 19
Sunthud Pornprasertmanit, et. al.Sunthud Pornprasertmanit ... W Joel Schneider
01 Jan 2014
Psychological methods | VOL. 19

SU18 - INVESTIGATING SHRINKAGE METHODS TO IMPROVE ACCURACY OF GWAS AND PRS EFFECT SIZE ESTIMATES
Yunfeng Ruan ... Paul O'Reilly
European Neuropsychopharmacology | VOL. 29
Yunfeng Ruan, et. al.Yunfeng Ruan ... Paul O'Reilly
01 Jan 2019
European Neuropsychopharmacology | VOL. 29

Sailing From the Seas of Chaos Into the Corridor of Stability
Daniël Lakens ... Ellen R K Evers
Perspectives on Psychological Science | VOL. 9
Daniël Lakens, et. al.Daniël Lakens ... Ellen R K Evers
01 May 2014
Perspectives on Psychological Science | VOL. 9

The accuracy of effect-size estimates under normals and contaminated normals in meta-analysis
Philomena Marfo ... G.A Okyere
Heliyon | VOL. 5
Philomena Marfo, et. al.Philomena Marfo ... G.A Okyere
01 Jun 2019
Heliyon | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Risk estimation using probability machines

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BioData Mining