Abstract

As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression.

Highlights

  • Genome-wide association studies (GWASs) have been successfully applied to reveal genetic markers associated with hundreds of traits and diseases

  • In our work we compare the performance of summary statistics imputation to genotype imputation

  • We demonstrate the potential of summary statistics imputation by presenting 34 novel height-associated loci, 19 of which were confirmed in UK Biobank

Read more

Summary

Introduction

Genome-wide association studies (GWASs) have been successfully applied to reveal genetic markers associated with hundreds of traits and diseases. Thousands of individuals have been sequenced [1, 2] to provide high resolution haplotypes for genotype imputation tools such as IMPUTE and minimac [3, 4], which are able to infer sequence variants with ever-increasing accuracy as the reference haplotype set grows. Downstream analyses such as Mendelian randomisation [5], approximate conditional analysis [6], heritability estimation [7], and enrichment analysis using high resolution annotation (such as DHS) [8] often require genome-wide association results at the highest possible genomic resolution. For imputation of the UK Biobank data, it is about 500 times faster (4200 vs 8.3 CPU days comparing Minimac [4] to our SSIMP software [11])

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call