Abstract

The current trend in genome-wide association studies is to identify regions where the true disease-causing genes may lie by evaluating thousands of single-nucleotide polymorphisms (SNPs) across the whole genome. However, many challenges exist in detecting disease-causing genes among the thousands of SNPs. Examples include multicollinearity and multiple testing issues, especially when a large number of correlated SNPs are simultaneously tested. Multicollinearity can often occur when predictor variables in a multiple regression model are highly correlated, and can cause imprecise estimation of association. In this study, we propose a simple stepwise procedure that identifies disease-causing SNPs simultaneously by employing elastic-net regularization, a variable selection method that allows one to address multicollinearity. At Step 1, the single-marker association analysis was conducted to screen SNPs. At Step 2, the multiple-marker association was scanned based on the elastic-net regularization. The proposed approach was applied to the rheumatoid arthritis (RA) case-control data set of Genetic Analysis Workshop 16. While the selected SNPs at the screening step are located mostly on chromosome 6, the elastic-net approach identified putative RA-related SNPs on other chromosomes in an increased proportion. For some of those putative RA-related SNPs, we identified the interactions with sex, a well known factor affecting RA susceptibility.

Highlights

  • Genome-wide association studies (GWAS) have become a promising new tool for deciphering the genetics of complex diseases, which are usually polygenic and affected by gene-by-environmental interactions

  • We propose a simple stepwise procedure that employs the elastic-net regularization-based approach [5] to take the overall correlation structure of single-nucleotide polymorphisms (SNPs) into account when selecting disease-causing genes automatically in genome-wide association studies (GWAS)

  • The proposed approach was applied to the rheumatoid arthritis (RA) case-control dataset of Genetic Analysis Workshop 16 (GAW16)

Read more

Summary

Introduction

Genome-wide association studies (GWAS) have become a promising new tool for deciphering the genetics of complex diseases, which are usually polygenic and affected by gene-by-environmental interactions. BMC Proceedings 2009, 3(Suppl 7):S25 http://www.biomedcentral.com/1753-6561/3/S7/S25 sequential selection processes, these methods may miss the overall correlation structure of the genes. Another example is random forest [4], based on which true disease-causing genes can be hidden due to other genes; the identification result may not be robust. We propose a simple stepwise procedure that employs the elastic-net regularization-based approach [5] to take the overall correlation structure of single-nucleotide polymorphisms (SNPs) into account when selecting disease-causing genes automatically in GWAS. We eliminate most of noise SNPs via single-marker association tests, and select the largest number of candidate SNPs that can be analyzed by the elastic-net approach at the step. Interactions between SNP and non-genotypic factor (e.g., sex) can be examined

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.