Abstract

Founder populations and large pedigrees offer many well-known advantages for genetic mapping studies, including cost-efficient study designs. Here, we describe PRIMAL (PedigRee IMputation ALgorithm), a fast and accurate pedigree-based phasing and imputation algorithm for founder populations. PRIMAL incorporates both existing and original ideas, such as a novel indexing strategy of Identity-By-Descent (IBD) segments based on clique graphs. We were able to impute the genomes of 1,317 South Dakota Hutterites, who had genome-wide genotypes for ~300,000 common single nucleotide variants (SNVs), from 98 whole genome sequences. Using a combination of pedigree-based and LD-based imputation, we were able to assign 87% of genotypes with >99% accuracy over the full range of allele frequencies. Using the IBD cliques we were also able to infer the parental origin of 83% of alleles, and genotypes of deceased recent ancestors for whom no genotype information was available. This imputed data set will enable us to better study the relative contribution of rare and common variants on human phenotypes, as well as parental origin effect of disease risk alleles in >1,000 individuals at minimal cost.

Highlights

  • Despite decreasing costs of whole exome and whole genome sequencing, the role of rare genetic variants in common disease risk remains hard to assess due to the very large sample sizes required for such studies [1,2]

  • The recent availability of whole genome and whole exome sequencing allows genetic studies of human diseases and traits at an unprecedented resolution, their cost limits the size of the studied sample. To overcome this limitation and design cost-efficient studies, we developed a two step method: sequencing of relatively few members of a well-characterized founder population followed by pedigree-based whole genome imputation of many other individuals with genome-wide genotype data

  • Parental origin was assigned to 83% of the alleles

Read more

Summary

Introduction

Despite decreasing costs of whole exome and whole genome sequencing, the role of rare genetic variants in common disease risk remains hard to assess due to the very large sample sizes required for such studies [1,2]. Similar to mutations for rare monogenic disorders reaching relatively common frequencies in founder populations [3,4,5,6], subsets of the rare variants contributing to common complex diseases are expected to occur at higher frequencies in these populations. This provides a unique opportunity to study the relative roles of rare and common variants on common disease risk in individuals exposed to similar environments, which further minimizes the contribution of non-genetic factors to inter-individual variation in disease risk and facilitates identification of disease-associated alleles

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call