Abstract
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease.
Highlights
Predicting the probability of the occurrence of a binary outcome or event is of key importance in many areas of clinical and health services research
A logistic regression model that incorporated restricted cubic smoothing splines resulted in the greatest out-of-sample predictive accuracy when using the Effective Cardiac Treatment (EFFECT) Follow-up sample as the validation sample
We examined the ability of ensemble-based methods to predict the probability of 30-day mortality in patients who were hospitalized with either an acute myocardial infarction (AMI) or congestive heart failure (CHF)
Summary
Predicting the probability of the occurrence of a binary outcome or event is of key importance in many areas of clinical and health services research. Accurate prediction of the probability of patient outcomes, such as mortality, allows for effective risk stratification of subjects and for the comparison of health care outcomes across different providers. Logistic regression is the most commonly used method for prediction in the biomedical literature. Many clinical investigators are interested in the use of regression trees to predict the probability of the occurrence of an event. Despite studies highlighting the inferior predictive accuracy of regression trees compared to that of logistic regression (Ennis et al, 1998; Austin 2007), some authors continue to. C 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.