Abstract

Admixed populations can make an important contribution to the discovery of disease susceptibility genes if the parental populations exhibit substantial variation in susceptibility. Admixture mapping has been used successfully, but is not designed to cope with populations that have more than two or three ancestral populations. The inference of admixture proportions and local ancestry and the imputation of missing genotypes in admixed populations are crucial in both understanding variation in disease and identifying novel disease loci. These inferences make use of reference populations, and accuracy depends on the choice of ancestral populations. Using an insufficient or inaccurate ancestral panel can result in erroneously inferred ancestry and affect the detection power of GWAS and meta-analysis when using imputation. Current algorithms are inadequate for multi-way admixed populations. To address these challenges we developed PROXYANC, an approach to select the best proxy ancestral populations. From the simulation of a multi-way admixed population we demonstrate the capability and accuracy of PROXYANC and illustrate the importance of the choice of ancestry in both estimating admixture proportions and imputing missing genotypes. We applied this approach to a complex, uniquely admixed South African population. Using genome-wide SNP data from over 764 individuals, we accurately estimate the genetic contributions from the best ancestral populations: isiXhosa , ‡Khomani SAN , European , Indian , and Chinese . We also demonstrate that the ancestral allele frequency differences correlate with increased linkage disequilibrium in the South African population, which originates from admixture events rather than population bottlenecks.NomenclatureThe collective term for people of mixed ancestry in southern Africa is “Coloured,” and this is officially recognized in South Africa as a census term, and for self-classification. Whilst we acknowledge that some cultures may use this term in a derogatory manner, these connotations are not present in South Africa, and are certainly not intended here.

Highlights

  • The field of population genetics has experienced a resurgence in the past few years due to access to extensive single nucleotide polymorphism data

  • Using the best proxy ancestral populations found by PROXYANC, we demonstrated that the ancestral allele frequency differences correlated with increased linkage disequilibrium (LD) in the South African Coloured population (SAC), indicating that increased admixture LD is present in this population, and the observed LD has its origin from admixture events

  • Proxy Ancestral Selection We developed the method PROXYANC, which searches for the best combination of reference populations that can minimize the genetic distance between the admixed population and all possible synthetic populations, consisting of a linear combination from reference populations

Read more

Summary

Introduction

The field of population genetics has experienced a resurgence in the past few years due to access to extensive single nucleotide polymorphism data. In order to understand the genetic variation which could be observed at genetic marker locations within and among populations, the inference of both local ancestry and population structure from the genotypes of single nucleotide polymorphisms is crucial These inferences, including the imputation of missing genotypes in genome-wide association studies (GWAS) utilize panels of reference ancestral populations based on place-of-origin, ethnic or continent affiliation [5,6,7,8,9,10,11,12,13]. The availability of high-throughput genotype data from various populations may facilitate the choice of best proxy ancestry of a recently admixed population from a pool of reference populations This choice is critical in both the study of population genetics and in identifying genes underlying ethnic difference in genetic diseases risk [1,2,3,4]. These issues may affect the inference of ancestry and the detection power of GWAS and metaanalysis when using imputation, in multi-way admixed populations

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call