Abstract
Genetic variants in genome-wide association studies (GWAS) are tested for disease association mostly using simple regression, one variant at a time. Standard approaches to improve power in detecting disease-associated SNPs use multiple regression with Bayesian variable selection in which a sparsity-enforcing prior on effect sizes is used to avoid overtraining and all effect sizes are integrated out for posterior inference. For binary traits, the logistic model has not yielded clear improvements over the linear model. For multi-SNP analysis, the logistic model required costly and technically challenging MCMC sampling to perform the integration. Here, we introduce the quasi-Laplace approximation to solve the integral and avoid MCMC sampling. We expect the logistic model to perform much better than multiple linear regression except when predicted disease risks are spread closely around 0.5, because only close to its inflection point can the logistic function be well approximated by a linear function. Indeed, in extensive benchmarks with simulated phenotypes and real genotypes, our Bayesian multiple LOgistic REgression method (B-LORE) showed considerable improvements (1) when regressing on many variants in multiple loci at heritabilities ≥ 0.4 and (2) for unbalanced case-control ratios. B-LORE also enables meta-analysis by approximating the likelihood functions of individual studies by multivariate normal distributions, using their means and covariance matrices as summary statistics. Our work should make sparse multiple logistic regression attractive also for other applications with binary target variables. B-LORE is freely available from: https://github.com/soedinglab/b-lore.
Highlights
Common, noninfectious diseases are responsible for over 2⁄3 of the deaths worldwide
Genome wide association studies (GWAS) have become the primary approach for identifying genetic variants associated with the origination of complex diseases
Bayesian multiple LOgistic REgression method (B-LORE) provides the best ranking of single nucleotide polymorphisms (SNPs) followed by the multiple regression methods (BVSR and FINEMAP), which are better than single-SNP metaanalysis (META)
Summary
Genome wide association studies (GWAS) have opened up a fundamentally new approach to identify novel regions of the genome which are associated with these complex human diseases. GWAS identified thousands of genetic variants, single nucleotide polymorphisms (SNPs), associated with many diseases and complex traits [1, 2]. In a typical GWAS, genotype data comprising millions of SNPs from thousands of individuals with some trait are analyzed to identify SNPs that have significant associations with the trait. Use a linear model for regression of the trait by the minor allele counts of the SNP. Case-control GWAS, for which the binary trait is either “diseased” (“cases”) or “healthy” (“controls”), use a logistic model for regression
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.