SU22 - BIVARIATE GAUSSIAN MIXTURE MODEL FOR GWAS SUMMARY STATISTICS

Oleksandr Frei,Olav Smeland,Dominic Holland,Alexey Shadrin,Wesley Thompson,Ole Andreassen,Anders Dale

doi:10.1016/j.euroneuro.2017.08.211

Abstract

Background Identifying shared genetics is important as it uncovers hidden relationship between complex traits and improves our understanding of disease etiology. Today genetic correlation is commonly used as the principal measure that quantifies genetic overlap. Available methods can calculate genetic correlation from raw genotypes (restricted maximum likelihood, polygenic risk scores), from a set of Single-Nucleotide Polymorphisms (SNPs) that pass genome-wide significance threshold (Mendelian Randomization), or utilizing all data from Genome Wide Association Studies (GWASes) including SNPs that do not reach genome-wide significance (Cross-Trait Linkage Disequilibrium (LD) Score Regression). These methods for evaluating genetic overlap are unable to capture a mixture of effect directions across shared genetic variants (i.e., only reporting overall positive, negative or no genetic correlations), while recent analyses suggest that correlation across polygenic traits in the effect size and directionality of a SNP is not the same across all SNPs. Methods Bivariate Gaussian Mixture Model provides a novel way to detect and quantify shared genetics independently from genetic correlation. In the absence of genetic correlation we use another measure of polygenic overlap, expressed as the proportion of SNPs associated with both traits. To estimate this quantity, we model true per-SNP effect sizes as a mixture of four bivariate normal distributions: two causal components specific to each trait, one causal component of SNPs affecting both traits, and a null component of SNPs with no effect on either trait. We interpret the weight of each component in the mixture as the proportion of SNPs in that component. The parameters of the mixture model are estimated from the summary statistics data by direct optimization of the maximum likelihood. Our statistical model relates observed signed test statistics (GWAS z-scores) to the underlying per-SNP effect sizes, incorporating effects of LD structure, minor allele frequency, sample size, cryptic relationships / sample stratification, and sample overlap, to capture all these effects on GWAS z-scores. Results We show in simulations that our model differentiates cases with no polygenic overlap versus cases with significant overlap in the absence of genetic correlation. We show genetic overlap between schizophrenia, waist-hip-ratio and triglycerides, despite the fact that genetic correlation is close to zero. We apply our model to schizophrenia, bipolar disorder and IQ data, and show that in the presence of genetic correlation our model reports consistent results with those from cross-trait LD score regression. Discussion It appears to be a common practice to interpret the lack of genetic correlation as no relation between traits. However, lack of correlation does not necessarily imply independence - a fact well known from statistics. Thus, we advocate that certain important cases of genetic overlap are not captured by genetic correlation. Our model quantifies genetic overlap between traits both with and without genetic correlation. It controls for the fact that if both traits are highly polygenic then some proportion of SNPs is associated with both traits by chance or due to LD structure. In addition to the insights into genetic architecture our model can improve SNP discovery, by estimating posterior effect size of one trait given observed GWAS z-score in another trait. Further, more accurately estimated effect sizes can be used for improving polygenic risk scores.

Full Text