Abstract

When modeling the risk of a disease, the very act of selecting the factors to be included can heavily impact the results. This study compares the performance of several variable selection techniques applied to logistic regression. We performed realistic simulation studies to compare five methods of variable selection: (1) a confidence interval (CI) approach for significant coefficients, (2) backward selection, (3) forward selection, (4) stepwise selection, and (5) Bayesian stochastic search variable selection (SSVS) using both informed and uniformed priors. We defined our simulated diseases mimicking odds ratios for cancer risk found in the literature for environmental factors, such as smoking; dietary risk factors, such as fiber; genetic risk factors, such as XPD; and interactions. We modeled the distribution of our covariates, including correlation, after the reported empirical distributions of these risk factors. We also used a null data set to calibrate the priors of the Bayesian method and evaluate its sensitivity. Of the standard methods (95 per cent CI, backward, forward, and stepwise selection) the CI approach resulted in the highest average per cent of correct associations and the lowest average per cent of incorrect associations. SSVS with an informed prior had a higher average per cent of correct associations and a lower average per cent of incorrect associations than the CI approach. This study shows that the Bayesian methods offer a way to use prior information to both increase power and decrease false-positive results when selecting factors to model complex disease risk.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call