Abstract

The determination of the ancestry and genetic backgrounds of the subjects in genetic and general epidemiology studies is a crucial component in the analysis of relevant outcomes or associations. Although there are many methods for differentiating ancestral subgroups among individuals based on genetic markers only a few of these methods provide actual estimates of the fraction of an individual’s genome that is likely to be associated with different ancestral populations. We propose a method for assigning ancestry that works in stages to refine estimates of ancestral population contributions to individual genomes. The method leverages genotype data in the public domain obtained from individuals with known ancestries. Although we showcase the method in the assessment of ancestral genome proportions leveraging largely continental populations, the strategy can be used for assessing within-continent or more subtle ancestral origins with the appropriate data.

Highlights

  • Allele frequencies at most loci throughout the genome vary among populations (Cavalli-Sforza et al, 1994)

  • Most techniques used for assessing variation in genetic background and ancestry among a sample of individuals based on the observed genotypic profiles those individuals possess rely on “unsupervised” clustering approaches, whereby individuals in a sample with similar genotypic profiles are considered members of a particular ancestral group whose origins or geographic and historical context is not immediately obvious (Pritchard et al, 2000; Tang et al, 2005; Alexander et al, 2009)

  • The majority of studies that require this information use selfreported ancestry as a proxy for biogeographic ancestry. This practice has many limitations (Pfaff et al, 2001; Klimentidis et al, 2009; Tayo et al, 2011), especially for recently admixed individuals, such as Hispanics or African Americans, whose genetic ancestry has been shaped by admixture from several source continental populations, and the precise contribution from each source population is often unknown

Read more

Summary

Introduction

Allele frequencies at most loci throughout the genome vary among populations (Cavalli-Sforza et al, 1994). Most techniques used for assessing variation in genetic background and ancestry among a sample of individuals based on the observed genotypic profiles those individuals possess rely on “unsupervised” clustering approaches, whereby individuals in a sample with similar genotypic profiles are considered members of a particular ancestral group whose origins or geographic and historical context is not immediately obvious (Pritchard et al, 2000; Tang et al, 2005; Alexander et al, 2009). Individual ancestry estimation is necessary for relating phenotypes to the variation in genetic background (Allison et al, 2010; Fejerman et al, 2010; Kumar et al, 2010; Yang et al, 2011), as well as developing appropriate reference panels for, e.g., determining population-specific allele frequencies or searching for de novo www.frontiersin.org

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call