Abstract

Ancestry informative single nucleotide polymorphisms (SNPs) can identify biogeographic ancestry (BGA); however, population substructure and relatively recent admixture can make differentiation difficult in heterogeneous Hispanic populations. Utilizing unrelated individuals from the Genomic Origins and Admixture in Latinos dataset (GOAL, n = 160), we designed an 80 SNP panel (Setser80) that accurately depicts BGA through STRUCTURE and PCA. We compared our Setser80 to the Seldin and Kidd panels via resampling simulations, which models data based on allele frequencies. We incorporated Admixed American 1000 Genomes populations (1000 G, n = 347), into a combined populations dataset to determine robustness. Using multinomial logistic regression (MLR), we compared the 3 panels on the combined dataset and found overall MLR classification accuracies: 93.2% Setser80, 87.9% Seldin panel, 71.4% Kidd panel. Naïve Bayesian classification had similar results on the combined dataset: 91.5% Setser80, 84.7% Seldin panel, 71.1% Kidd panel. Although Peru and Mexico were absent from panel design, we achieved high classification accuracy on the combined populations for Peru (MLR = 100%, naïve Bayes = 98%), and Mexico (MLR = 90%, naïve Bayes = 83.4%) as evidence of the portability of the Setser80. Our results indicate the Setser80 SNP panel can reliably classify BGA for individuals of presumed Hispanic origin.

Highlights

  • Ancestry informative single nucleotide polymorphisms (SNPs) can identify biogeographic ancestry (BGA); population substructure and relatively recent admixture can make differentiation difficult in heterogeneous Hispanic populations

  • We evaluated the ability of a newly developed Hispanic AIMs panel versus the Seldin1289 and Kidd5511 to separate heterogeneous Hispanic populations in the Genomic Origins and Admixture in Latinos (GOAL) dataset using STRUCTURE26 and principal components analysis (PCA)

  • Using the genetic proportions from STRUCTURE26 for the Seldin1289 and Kidd5511 panels, HUR and COL separated predominately into Cluster 1 (HUR: Seldin128 = 0.7274, Kidd55 = 0.7258)(COL: Seldin128 = 0.5370, Kidd55 = 0.5311) (Table 1), but the remaining populations did not separate into distinct clusters

Read more

Summary

Introduction

Ancestry informative single nucleotide polymorphisms (SNPs) can identify biogeographic ancestry (BGA); population substructure and relatively recent admixture can make differentiation difficult in heterogeneous Hispanic populations. Ancestry informative marker (AIMs) panels are “continental” in nature, focused on admixture mapping to determine from which of the six inhabited continents an individual has ancestry; these include: Seldin1289, Galanter et al.’s 44610, Kidd5511, EUROFORGEN12, Genetic Atlas[13], Genographic Project[14], Cuba by Marcheco-Teruel et al.[15], and Cuba by Fortes-Lima et al.[16] These studies assessed continental ancestry proportions (e.g. Seldin128)[9], highly differentiated populations may be detected within continental panels, even identifying admixed populations such as Gujarati Indians in Houston, TX and Mexican ancestry from Los Angeles, CA17. The size of this panel[14], the proprietary nature of the SNPs on their Genochip[22], and poor representation of the Western hemisphere, has prompted us to create a small, efficient, and publicly available SNP panel concentrated on BGA of Central America, South America, and the Caribbean

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call