Abstract

BackgroundThe utilization of biological data to infer the geographic origins of human populations has been a long standing quest for biologists and anthropologists. Several biogeographical analysis tools have been developed to infer the geographical origins of human populations utilizing genetic data. However due to the inherent complexity of genetic information these approaches are prone to misinterpretations. The Geographic Population Structure (GPS) algorithm is an admixture based tool for biogeographical analyses and has been employed for the geo-localization of various populations worldwide. Here we sought to dissect its sensitivity and accuracy for localizing highly admixed groups. Given the complex history of population dispersal and gene flow in the Indian subcontinent, we have employed the GPS tool to localize five South Asian populations, Punjabi, Gujarati, Tamil, Telugu and Bengali from the 1000 Genomes project, some of whom were recent migrants to USA and UK, using populations from the Indian subcontinent available in Human Genome Diversity Panel (HGDP) and those previously described as reference.ResultsOur findings demonstrate reasonably high accuracy with regards to GPS assignment even for recent migrant populations sampled elsewhere, namely the Tamil, Telugu and Gujarati individuals, where 96%, 87% and 79% of the individuals, respectively, were positioned within 600 km of their native locations. While the absence of appropriate reference populations resulted in moderate-to-low levels of precision in positioning of Punjabi and Bengali genomes.ConclusionsOur findings reflect that the GPS approach is useful but likely overtly dependent on the relative proportions of admixture in the reference populations for determination of the biogeographical origins of test individuals. We conclude that further modifications are desired to make this approach more suitable for highly admixed individuals.

Highlights

  • The utilization of biological data to infer the geographic origins of human populations has been a long standing quest for biologists and anthropologists

  • For the global dataset (N = 1583), the lowest cross-validation error (CVE) was estimated for K = 13 (Additional file 1: Figure S1), while for the South Asians only dataset (N = 1064) the lowest CVE was estimated for K = 8 (Additional file 1: Figure S2)

  • Despite its success in tracing ancestry of several modern-day populations and several other likely applications, our findings exemplify that the Geographic Population Structure (GPS) approach is heavily dependent on the relative proportions of admixture in the reference populations to articulate the population history and biogeographical origins of test individuals

Read more

Summary

Introduction

The utilization of biological data to infer the geographic origins of human populations has been a long standing quest for biologists and anthropologists. While GPS has been demonstrated to be superior to other existing methods for tracing the ancestry of human populations [2,3,4,5,6,7], it may not be accurate for tracing ancestry of recently admixed individuals and groups (up to 1000 years before present) [2, 8] It relies on extrapolating the genomic similarity between the query and reference populations to infer the likely biogeographical affinity of the former using the geographic locations (latitude and longitude) corresponding to the latter as a reference. Its utility and robustness in accurately localizing highly admixed populations whose genetic structure has been modified by significant demographic, biological and social factors has remained largely unexplored

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.