Abstract

BackgroundAutomatic variable selection methods are usually discouraged in medical research although we believe they might be valuable for studies where subject matter knowledge is limited. Bayesian model averaging may be useful for model selection but only limited attempts to compare it to stepwise regression have been published. We therefore performed a simulation study to compare stepwise regression with Bayesian model averaging.MethodsWe simulated data corresponding to five different data generating processes and thirty different values of the effect size (the parameter estimate divided by its standard error). Each data generating process contained twenty explanatory variables in total and had between zero and two true predictors. Three data generating processes were built of uncorrelated predictor variables while two had a mixture of correlated and uncorrelated variables. We fitted linear regression models to the simulated data. We used Bayesian model averaging and stepwise regression respectively as model selection procedures and compared the estimated selection probabilities.ResultsThe estimated probability of not selecting a redundant variable was between 0.99 and 1 for Bayesian model averaging while approximately 0.95 for stepwise regression when the redundant variable was not correlated with a true predictor. These probabilities did not depend on the effect size of the true predictor. In the case of correlation between a redundant variable and a true predictor, the probability of not selecting a redundant variable was 0.95 to 1 for Bayesian model averaging while for stepwise regression it was between 0.7 and 0.9, depending on the effect size of the true predictor. The probability of selecting a true predictor increased as the effect size of the true predictor increased and leveled out at between 0.9 and 1 for stepwise regression, while it leveled out at 1 for Bayesian model averaging.ConclusionsOur simulation study showed that under the given conditions, Bayesian model averaging had a higher probability of not selecting a redundant variable than stepwise regression and had a similar probability of selecting a true predictor. Medical researchers building regression models with limited subject matter knowledge could thus benefit from using Bayesian model averaging.

Highlights

  • Automatic variable selection methods are usually discouraged in medical research we believe they might be valuable for studies where subject matter knowledge is limited

  • Probability of not selecting a redundant variable For data generating process 1, Bayesian model averaging with 95% threshold almost never selects redundant variables, Bayesian model averaging with 50% threshold selects a redundant variable 1 time per hundred and stepwise regression selects a redundant variable with probability 0.05

  • Probability of selecting an indirect predictor (Data generating process 5) For Bayesian model averaging with 95% threshold the probability of selecting an indirect predictor (x2 in data generating process 5) was approximately constant at 0 but for stepwise regression it increased to approximately 0.2 for effect size corresponding to a t-test statistic between 0 and 3 and at t-test statistic of approximately 7 the probability decreased and leveled out at approximately 0.1 (Figure 3e)

Read more

Summary

Introduction

Automatic variable selection methods are usually discouraged in medical research we believe they might be valuable for studies where subject matter knowledge is limited. Our interest is not on testing a limited number of well-defined hypotheses but on describing associations between potential predictors and the outcome It is, if not impossible, hard to manually assess all combinations of predictor variables even when we ignore the possibility of interactions. If not impossible, hard to manually assess all combinations of predictor variables even when we ignore the possibility of interactions In such scenarios there are strong arguments for making use of datadriven model selection methods (ideally in conjunction with subject matter knowledge if there is any). In this context false positives (type I error) can be a major problem [2,3,4]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.