Abstract

Knowledge of human origins, migrations, and expansions is greatly enhanced by the availability of large datasets of genetic information from different populations and by the development of bioinformatic tools used to analyze the data. We present Ancestry Mapper, which we believe improves on existing methods, for the assignment of genetic ancestry to an individual and to study the relationships between local and global populations. The principle function of the method, named Ancestry Mapper, is to give each individual analyzed a genetic identifier, made up of just 51 genetic coordinates, that corresponds to its relationship to the HGDP reference population. As a consequence, the Ancestry Mapper Id (AMid) has intrinsic biological meaning and provides a tool to measure similarity between world populations. We applied Ancestry Mapper to a dataset comprised of the HGDP and HapMap data. The results show distinctions at the continental level, while simultaneously giving details at the population level. We clustered AMids of HGDP/HapMap and observe a recapitulation of human migrations: for a small number of clusters, individuals are grouped according to continental origins; for a larger number of clusters, regional and population distinctions are evident. Calculating distances between AMids allows us to infer ancestry. The number of coordinates is expandable, increasing the power of Ancestry Mapper. An R package called Ancestry Mapper is available to apply this method to any high density genomic data set.

Highlights

  • Human genetic diversity is a fundamental question in biology, relevant to population genetics, and genome wide association studies

  • References for Ancestry Mapper The Human Genome Diversity Project (HGDP) data set is composed of 938 individuals of 51 populations, genotyped using the Illumina platform (644,285 SNPs) and is available at http://hagsc.org/hgdp/files.html [7]

  • Ancestry Mapper uses a single individual as the reference for each HGDP population; 51 references form the basis of AM

Read more

Summary

Introduction

Human genetic diversity is a fundamental question in biology, relevant to population genetics, and genome wide association studies. Human diversity and ancestry assignment have been studied by two main methodologies: clustering and principal component analysis (PCA) [2]. Individuals are placed into groups based on similarities of SNP frequencies. Principal Component Analysis (PCA) reduces the information contained in SNP frequencies to components, which capture most genetic variability; one recent and popular methodology is Eigensoft [5]. Individuals are separated into distinct groups by plotting components against each other. This approach is data-set dependent because principal components vary depending on the diversity and number of samples

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.