Abstract

We begin by arguing that the often used algorithm for the discovery and use of disease risk factors, stepwise logistic regression, is unstable. We then argue that there are other algorithms available that are much more stable and reliable (e.g. the lasso and gradient boosting). We then propose a protocol for the discovery and use of risk factors using lasso or boosting variable selection. We then illustrate the use of the protocol with a set of prostate cancer data and show that it recovers known risk factors. Finally, we use the protocol to identify new and important SNP based risk factors for prostate cancer and further seek evidence for or against the hypothesis of an anticancer function for Selenium in prostate cancer. We find that the anticancer effect may depend on the SNP-SNP interaction and, in particular, which alleles are present.

Highlights

  • We begin by arguing that the often used algorithm for the discovery and use of disease risk factors, stepwise logistic regression, is unstable

  • Our paper makes the following points: 1. We summarize some of the studies that show that stepwise regression and its variants, as used more often than they should be in risk factor studies, are unreliable and may cause some of the irreproducibility of life sciences research as discussed ­by[18] as we shall discuss later

  • The R functions for variable selection along with the papers are available from ­Boos[23], and used as described there

Read more

Summary

Introduction

We begin by arguing that the often used algorithm for the discovery and use of disease risk factors, stepwise logistic regression, is unstable. We propose a protocol for the discovery and use of risk factors using lasso or boosting variable selection. In the present paper we introduce two newer variable selection method, the lasso and gradient boosting which we argue are large improvements to the often presently used ­methods[1]. While a comparison between stepwise regression and lasso/gradient boosting was beyond the scope of the present work, the aim of which was to identify SNP based risk factors for prostate cancer. Austin and ­Tu1 have previously established the instability (in the sense that so many candidate models were produced by stepwise regression) it is practically too labor intensive for use in routine risk factor studies

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.