Bayesian multiple logistic regression for case-control GWAS.

Saikat Banerjee,Lingyao Zeng,Johannes Söding,Heribert Schunkert

doi:10.1371/journal.pgen.1007856

Abstract

Genetic variants in genome-wide association studies (GWAS) are tested for disease association mostly using simple regression, one variant at a time. Standard approaches to improve power in detecting disease-associated SNPs use multiple regression with Bayesian variable selection in which a sparsity-enforcing prior on effect sizes is used to avoid overtraining and all effect sizes are integrated out for posterior inference. For binary traits, the logistic model has not yielded clear improvements over the linear model. For multi-SNP analysis, the logistic model required costly and technically challenging MCMC sampling to perform the integration. Here, we introduce the quasi-Laplace approximation to solve the integral and avoid MCMC sampling. We expect the logistic model to perform much better than multiple linear regression except when predicted disease risks are spread closely around 0.5, because only close to its inflection point can the logistic function be well approximated by a linear function. Indeed, in extensive benchmarks with simulated phenotypes and real genotypes, our Bayesian multiple LOgistic REgression method (B-LORE) showed considerable improvements (1) when regressing on many variants in multiple loci at heritabilities ≥ 0.4 and (2) for unbalanced case-control ratios. B-LORE also enables meta-analysis by approximating the likelihood functions of individual studies by multivariate normal distributions, using their means and covariance matrices as summary statistics. Our work should make sparse multiple logistic regression attractive also for other applications with binary target variables. B-LORE is freely available from: https://github.com/soedinglab/b-lore.

Highlights

Common, noninfectious diseases are responsible for over 2⁄3 of the deaths worldwide
Genome wide association studies (GWAS) have become the primary approach for identifying genetic variants associated with the origination of complex diseases
Bayesian multiple LOgistic REgression method (B-LORE) provides the best ranking of single nucleotide polymorphisms (SNPs) followed by the multiple regression methods (BVSR and FINEMAP), which are better than single-SNP metaanalysis (META)

Summary

Introduction

Genome wide association studies (GWAS) have opened up a fundamentally new approach to identify novel regions of the genome which are associated with these complex human diseases. GWAS identified thousands of genetic variants, single nucleotide polymorphisms (SNPs), associated with many diseases and complex traits [1, 2]. In a typical GWAS, genotype data comprising millions of SNPs from thousands of individuals with some trait are analyzed to identify SNPs that have significant associations with the trait. Use a linear model for regression of the trait by the minor allele counts of the SNP. Case-control GWAS, for which the binary trait is either “diseased” (“cases”) or “healthy” (“controls”), use a logistic model for regression

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS Genetics	Publication Date: Dec 31, 2018
Citations: 31	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Bayesian multiple logistic regression for case-control GWAS.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Genetics

Lead the way for us

Similar Papers

Analysis of Trends in Awareness Regarding Hepatitis Using Bayesian Multiple Logistic Regression Model
Ali Al-Alwan ... Navid Feroze
Mathematical Problems in Engineering | VOL. 2022
Ali Al-Alwan, et. al.Ali Al-Alwan ... Navid Feroze
09 Jun 2022
Mathematical Problems in Engineering | VOL. 2022

So Many Correlated Tests, So Little Time! Rapid Adjustment of P Values for Multiple Correlated Tests
Karen N Conneely ... Michael Boehnke
The American Journal of Human Genetics | VOL. 81
Karen N Conneely, et. al.Karen N Conneely ... Michael Boehnke
01 Dec 2007
The American Journal of Human Genetics | VOL. 81

Statistical Analysis of Data Fom Infertility Patients: How to Explicitly Consider the Decline in Fertility Associated With Age
H Grotjan ... M.L Uhler
Fertility and Sterility | VOL. 84
H Grotjan, et. al.H Grotjan ... M.L Uhler
01 Sep 2005
Fertility and Sterility | VOL. 84

Bayesian multiple membership multiple classification logistic regression model on student performance with random effects in university instructors and majors.
Elsa Vazquez Arreola ... Jeffrey R Wilson
PloS one | VOL. 15
Elsa Vazquez Arreola, et. al.Elsa Vazquez Arreola ... Jeffrey R Wilson
30 Jan 2020
PloS one | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bayesian multiple logistic regression for case-control GWAS.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Genetics