Abstract

Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants.

Highlights

  • Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays

  • Whole-genome sequencing (WGS) provides near-complete characterization of genetic variation, but it is still prohibitive for researchers to conduct WGS on the large number of samples that are needed to study phenotypic associations of low-frequency and rare genetic variants (minor allele frequency (MAF) o1–5% and o1% respectively)

  • We show that imputation accuracy can improve substantially when reference haplotypes are rephased after initial WGS genotype calling

Read more

Summary

Introduction

Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. Whole-genome sequencing (WGS) provides near-complete characterization of genetic variation, but it is still prohibitive for researchers to conduct WGS on the large number of samples that are needed to study phenotypic associations of low-frequency and rare genetic variants (minor allele frequency (MAF) o1–5% and o1% respectively). We describe a novel WGS imputation panel comprising 3,781 samples from the UK10K Cohorts project[6] We show that this reference panel greatly increases accuracy and coverage of low-frequency variants relative to a panel of 1,092 individuals from the 1000GP. We present a practical solution for combining imputation reference panels to increase variant coverage, and we introduce a new approximation that maintains the speed of existing approximations while achieving higher accuracy

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call