Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics.

Dominic Holland,Anders M Dale,Ole A Andreassen,Yunpeng Wang,Thomas Werge,Michael O'Donovan,Min-Tzu Lo,Wesley K Thompson,Aree Witoelar,Chi-Hua Chen,Andrew Schork

doi:10.3389/fgene.2016.00015

Abstract

Genome-wide Association Studies (GWAS) result in millions of summary statistics (“z-scores”) for single nucleotide polymorphism (SNP) associations with phenotypes. These rich datasets afford deep insights into the nature and extent of genetic contributions to complex phenotypes such as psychiatric disorders, which are understood to have substantial genetic components that arise from very large numbers of SNPs. The complexity of the datasets, however, poses a significant challenge to maximizing their utility. This is reflected in a need for better understanding the landscape of z-scores, as such knowledge would enhance causal SNP and gene discovery, help elucidate mechanistic pathways, and inform future study design. Here we present a parsimonious methodology for modeling effect sizes and replication probabilities, relying only on summary statistics from GWAS substudies, and a scheme allowing for direct empirical validation. We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal SNPs. The four-parameter model allows for estimating the degree of polygenicity of the phenotype and predicting the proportion of chip heritability explainable by genome-wide significant SNPs in future studies with larger sample sizes. We apply the model to recent GWAS of schizophrenia (N = 82,315) and putamen volume (N = 12,596), with approximately 9.3 million SNP z-scores in both cases. We show that, over a broad range of z-scores and sample sizes, the model accurately predicts expectation estimates of true effect sizes and replication probabilities in multistage GWAS designs. We assess the degree to which effect sizes are over-estimated when based on linear-regression association coefficients. We estimate the polygenicity of schizophrenia to be 0.037 and the putamen to be 0.001, while the respective sample sizes required to approach fully explaining the chip heritability are 106 and 105. The model can be extended to incorporate prior knowledge such as pleiotropy and SNP annotation. The current findings suggest that the model is applicable to a broad array of complex phenotypes and will enhance understanding of their genetic architectures.

Highlights

Many complex traits and common phenotypes have a genetic component that arises from large numbers of genetic loci (Visscher et al, 2012)
We show that modeling z-scores as a mixture of Gaussians is conceptually appropriate, in particular taking into account ubiquitous non-null effects that are likely in the datasets due to weak linkage disequilibrium with causal single nucleotide polymorphism (SNP)
The model has great utility in that it allows for the prediction of the replication probability and expected effect size for each SNP, given the discovery and replication sample sizes, the four model parameters, and the discovery sample z-score of the SNP

Summary

Introduction

Many complex traits and common phenotypes have a genetic component that arises from large numbers of genetic loci (Visscher et al, 2012). The total effect of the genetic component on phenotypic expression is often substantial, as indicated by measures of heritability (Tenesa and Haley, 2013; Witte et al, 2014) obtained from twin and family studies and genome-wide association studies (GWAS) for multiple phenotypes. GWAS provide a platform for uncovering the underlying genetic architecture, but this poses a substantial challenge, compounded by the complexity of the datasets: ∼104–105 individuals with ∼107 genetic markers (single nucleotide polymorphisms, or SNPs) in various levels of correlation (linkage disequilibrium, or LD), ∼106 of which are estimated to be independent (Dudbridge and Gusnanto, 2008; Pe’er et al, 2008), with multiple possible roles for SNPs in mechanistic pathways. With the number of markers much larger than number of individuals in GWAS, modeling assumptions are required so as to estimate parameters of interest and thereby obtain realistic descriptions of the numbers, distributions, and effect sizes of causal SNPs—and the considerably larger number of SNPs in strong LD with causal SNPs—which in turn can assist in causal SNP discovery and individual risk prediction, and inform mechanistic understanding of genetic effects in phenotypic expression

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Genetics	Publication Date: Feb 16, 2016
Citations: 36	License type: cc-by

R Discovery Prime

R Discovery Prime

Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics

Lead the way for us

Similar Papers

Author response: Heritability enrichment in context-specific regulatory networks improves phenotype-relevant tissue identification
Zhanying Feng ... Wing Hung Wong
-
Zhanying Feng, et. al.Zhanying Feng ... Wing Hung Wong
13 Dec 2022
13 Dec 2022

Author response: Mendelian randomization analysis provides causality of smoking on the expression of ACE2, a putative SARS-CoV-2 receptor
Hui Liu ... Junyi Xin
-
Hui Liu, et. al.Hui Liu ... Junyi Xin
13 Apr 2021
13 Apr 2021

Decision letter: Mendelian randomization analysis provides causality of smoking on the expression of ACE2, a putative SARS-CoV-2 receptor
Houfeng Zheng ... Eduardo Franco
-
Houfeng Zheng, et. al.Houfeng Zheng ... Eduardo Franco
08 Jan 2021
08 Jan 2021

Editor's evaluation: Phenome-wide Mendelian randomisation analysis identifies causal factors for age-related macular degeneration
Lois EH Smith
-
Lois EH SmithLois EH Smith
18 Dec 2022
18 Dec 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Estimating Effect Sizes and Expected Replication Probabilities from GWAS Summary Statistics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Genetics