Simultaneous discovery, estimation and prediction analysis of complex traits using a bayesian mixture model.

Gerhard Moser,Naomi R Wray,Ben J Hayes,Michael E Goddard,Peter M Visscher,Sang Hong Lee,Chris Haley

doi:10.1371/journal.pgen.1004969

Abstract

Gene discovery, estimation of heritability captured by SNP arrays, inference on genetic architecture and prediction analyses of complex traits are usually performed using different statistical models and methods, leading to inefficiency and loss of power. Here we use a Bayesian mixture model that simultaneously allows variant discovery, estimation of genetic variance explained by all variants and prediction of unobserved phenotypes in new samples. We apply the method to simulated data of quantitative traits and Welcome Trust Case Control Consortium (WTCCC) data on disease and show that it provides accurate estimates of SNP-based heritability, produces unbiased estimators of risk in new samples, and that it can estimate genetic architecture by partitioning variation across hundreds to thousands of SNPs. We estimated that, depending on the trait, 2,633 to 9,411 SNPs explain all of the SNP-based heritability in the WTCCC diseases. The majority of those SNPs (>96%) had small effects, confirming a substantial polygenic component to common diseases. The proportion of the SNP-based variance explained by large effects (each SNP explaining 1% of the variance) varied markedly between diseases, ranging from almost zero for bipolar disorder to 72% for type 1 diabetes. Prediction analyses demonstrate that for diseases with major loci, such as type 1 diabetes and rheumatoid arthritis, Bayesian methods outperform profile scoring or mixed model approaches.

Highlights

Genome wide association studies (GWAS) have been used for three different purposes—to map genetic variants causing variation in a trait, to estimate the genetic variance explained by all the single nucleotide polymorphisms (SNPs) that have been genotyped, and to predict the genetic value or future phenotype of individuals
We used a Bayesian mixture model and a priori assumed a mixture of four zero mean normal distributions of SNP effects (β), where the relative variance for each mixture component is fixed [8]: pðbjjp; s2g Þ 1⁄4 p1 Â Nð0; 0 Â s2g Þ þ p2 Â Nð0; 10À4 Â s2g Þþ p3 Â Nð0; 10À3 Â s2g Þ þ p4 Â Nð0; 10À2 Â s2g Þ: Here, π are the mixture proportions which are constrained to sum to unity and s2g is the additive genetic variance explained by SNPs
Causal effects were drawn from three groups of effect sizes, the first containing 10 SNPs with moderate effects, the second containing 310 SNPs with smaller effect, and a large group of 2,680 SNPs representing a polygenic component (S1 Fig.), where the definitions of moderate, small and polygenic effect size match those of the prior assumptions of Bayesian Mixture Model (BayesR)

Summary

Introduction

Genome wide association studies (GWAS) have been used for three different purposes—to map genetic variants causing variation in a trait, to estimate the genetic variance explained by all the single nucleotide polymorphisms (SNPs) that have been genotyped, and to predict the genetic value or future phenotype of individuals. These analyses are usually performed using different statistical models and methods. To estimate the variance explained by all the SNPs together, all genotyped or imputed SNPs can be included in the model simultaneously with their effects treated as random variables all drawn from a normal distribution with zero mean and constant variance. This gives an unbiased estimate of the variance explained, but all the estimated SNP effects are non-zero [5]

Methods

Results

Discussion

Conclusion