Across-cohort QC analyses of GWAS summary statistics from complex traits

The Genetic Investigation Of Anthropometric Traits (Giant) Consortium ,Guo-Bo Chen,Zoltán Kutalik,Ruth J F Loos,Joel N Hirschhorn,Damien C Croteau-Chonka,Zhi-Xiang Zhu,Peter M Visscher,Maciej Trzaskowski,Adam E Locke,Felix R Day,Sang Hong Lee,Thomas W Winkler,Andrew R Wood,Naomi R Wray,Jian Yang,Timothy M Frayling,Matthew R Robinson

doi:10.1038/ejhg.2016.106

Abstract

Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics Fst statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.

Highlights

To elucidate genetic architecture, which requires maximized statistical power for discovery of risk alleles of small effect, large genome-wide association meta-analyses (GWAMAs) are tending towards ever-larger scale that may contain data from hundreds of cohorts
Population genetic quality control (QC) analysis using Fst In GWAMA, only summary statistics such as allele frequencies are available to the central analysis hub, it is difficult to identify population outliers
We propose that a genetic distance inferred from Fst, which reflects genetic distance between pairwise populations, is a useful additional QC statistic to detect cohorts that are population outliers

Summary

Introduction

To elucidate genetic architecture, which requires maximized statistical power for discovery of risk alleles of small effect, large genome-wide association meta-analyses (GWAMAs) are tending towards ever-larger scale that may contain data from hundreds of cohorts. At the individual cohort level, genome-wide association study (GWAS) analysis is often based on various genotyping chips and conducted with different protocols, such as different software tools and reference populations for imputation, inclusion of study-specific covariates and association analyses using different methods and software. We propose a new set of QC metrics for GWAMA. All these applications assume that there is a central analysis hub, where summary statistic data from GWAS are uploaded for each cohort. All methods proposed are implemented in freely available software GEAR

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: European Journal of Human Genetics	Publication Date: Aug 24, 2016
Citations: 20	License type: open-access

R Discovery Prime

R Discovery Prime

Across-cohort QC analyses of GWAS summary statistics from complex traits

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: European Journal of Human Genetics

Lead the way for us

Similar Papers

GWAMA: software for genome-wide association meta-analysis
Reedik Mägi ... Andrew P Morris
BMC Bioinformatics | VOL. 11
Reedik Mägi, et. al.Reedik Mägi ... Andrew P Morris
28 May 2010
BMC Bioinformatics | VOL. 11

Author response: Genetic architecture of natural variation of cardiac performance from flies to humans
Saswati Saha ... Georg Vogler
-
Saswati Saha, et. al.Saswati Saha ... Georg Vogler
11 Oct 2022
11 Oct 2022

Decision letter: Genetic architecture of natural variation of cardiac performance from flies to humans
Detlef Weigel
-
Detlef WeigelDetlef Weigel
29 Sep 2022
29 Sep 2022

Editor's evaluation: Genetic architecture of natural variation of cardiac performance from flies to humans
Detlef Weigel
-
Detlef WeigelDetlef Weigel
29 Sep 2022
29 Sep 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Across-cohort QC analyses of GWAS summary statistics from complex traits

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: European Journal of Human Genetics