Abstract

Logistic regression is often used to help make medical decisions with binary outcomes. Here we evaluate the use of several methods for selection of variables in logistic regression. We use a large dataset to predict the diagnosis of myocardial infarction in patients reporting to an emergency room with chest pain. Our results indicate that some of the examined methods are well suited for variable selection in logistic regression and that our model, and our myocardial infarction risk calculator, can be an additional tool to aid physicians in myocardial infarction diagnosis.

Highlights

  • Logistic regression is a statistical technique for predicting the probability of an event, given a set of predictor variables

  • For each of the methods implemented in this study (BPSO, VBCM, Simulated Annealing (SA) and stochastic search – Random), the logistic regression coefficients were calculated using the training set

  • The performance of the model was measured by calculating the c-index, or the area under the Receiver Operating Characteristic (ROC) curve on the testing set [10]

Read more

Summary

Introduction

Logistic regression is a statistical technique for predicting the probability of an event, given a set of predictor variables. Variable selection is an important consideration when creating logistic regression models. The problem of variable selection is often addressed by sequential methods that start with a set of variables and attempt to grow or shrink the set by selecting which parameter should be added or removed from the set. This hill climbing approach has been traditionally called forward, backward and stepwise (or composite) selection [3]. For a survey of variable selection (for linear regression, but same techniques apply to logistic regression), see [6]

Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call