Abstract

Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array.

Highlights

  • By mapping the associations between single-nucleotide polymorphisms (SNPs) and various phenotypes, genome-wide association studies (GWAS) have allowed us to gain unprecedented knowledge on the genetic basis of various human diseases and traits

  • Untyped variants are statistically inferred through a process known as genotype imputation, where correlations between variants observed in external reference panels are leveraged to infer untyped variants in the study population

  • This is because typed variants incorporated on existing genotyping arrays can be unsuitable for the study population, and haplotype structures can be different between the reference and the study population

Read more

Summary

Introduction

By mapping the associations between single-nucleotide polymorphisms (SNPs) and various phenotypes, genome-wide association studies (GWAS) have allowed us to gain unprecedented knowledge on the genetic basis of various human diseases and traits. Genotyping arrays rely on the imputation of a sparse set of tag SNPs (e.g. millions of SNPs) to achieve acceptable density genome-wide (e.g. tens of millions of variants). The quality of imputation is dependent on the suitability of the tag SNPs and the similarity of haplotype structure between the reference panel and the study population [2,3,4,5]. For study populations where a genetically similar reference panel or population-specific array content may not be available, whole-genome sequencing (WGS) offers an alternative to genotyping arrays. Previous studies have suggested that WGS may offer substantial gains in such a scenario, potentially pinpointing loci absent in GWAS conducted using genotyping arrays [6, 7]. Due to the large sample sizes often required to gain sufficient statistical power in GWAS, the cost of WGS can still be prohibitive despite its recent decrease [8]

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call