HGDP and HapMap analysis by Ancestry Mapper reveals local and global population relationships.

Tiago R Magalhães,Darren J Fitzpatrick,João Sobral,Jillian P Casey,Regina Regan,Judith Conroy,Sean Ennis,Naisha Shah

doi:10.1371/journal.pone.0049438

Abstract

Knowledge of human origins, migrations, and expansions is greatly enhanced by the availability of large datasets of genetic information from different populations and by the development of bioinformatic tools used to analyze the data. We present Ancestry Mapper, which we believe improves on existing methods, for the assignment of genetic ancestry to an individual and to study the relationships between local and global populations. The principle function of the method, named Ancestry Mapper, is to give each individual analyzed a genetic identifier, made up of just 51 genetic coordinates, that corresponds to its relationship to the HGDP reference population. As a consequence, the Ancestry Mapper Id (AMid) has intrinsic biological meaning and provides a tool to measure similarity between world populations. We applied Ancestry Mapper to a dataset comprised of the HGDP and HapMap data. The results show distinctions at the continental level, while simultaneously giving details at the population level. We clustered AMids of HGDP/HapMap and observe a recapitulation of human migrations: for a small number of clusters, individuals are grouped according to continental origins; for a larger number of clusters, regional and population distinctions are evident. Calculating distances between AMids allows us to infer ancestry. The number of coordinates is expandable, increasing the power of Ancestry Mapper. An R package called Ancestry Mapper is available to apply this method to any high density genomic data set.

Highlights

Human genetic diversity is a fundamental question in biology, relevant to population genetics, and genome wide association studies
References for Ancestry Mapper The Human Genome Diversity Project (HGDP) data set is composed of 938 individuals of 51 populations, genotyped using the Illumina platform (644,285 SNPs) and is available at http://hagsc.org/hgdp/files.html [7]
Ancestry Mapper uses a single individual as the reference for each HGDP population; 51 references form the basis of AM

Summary

Introduction

Human genetic diversity is a fundamental question in biology, relevant to population genetics, and genome wide association studies. Human diversity and ancestry assignment have been studied by two main methodologies: clustering and principal component analysis (PCA) [2]. Individuals are placed into groups based on similarities of SNP frequencies. Principal Component Analysis (PCA) reduces the information contained in SNP frequencies to components, which capture most genetic variability; one recent and popular methodology is Eigensoft [5]. Individuals are separated into distinct groups by plotting components against each other. This approach is data-set dependent because principal components vary depending on the diversity and number of samples

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PloS one	Publication Date: Nov 26, 2012
Citations: 14	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

HGDP and HapMap analysis by Ancestry Mapper reveals local and global population relationships.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Similar Papers

Fast automatic estimation of the number of clusters from the minimum inter-center distance for k-means clustering
Avisek Gupta ... Swagatam Das
Pattern Recognition Letters | VOL. 116
Avisek Gupta, et. al.Avisek Gupta ... Swagatam Das
13 Sep 2018
Pattern Recognition Letters | VOL. 116

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
Jakrarin Therdphapiyanak ... Krerk Piromsopa
-
Jakrarin Therdphapiyanak, et. al.Jakrarin Therdphapiyanak ... Krerk Piromsopa
01 May 2013
01 May 2013

Dynamic parallel K-Means Algorithm Based On Dunn’s Index Method
Hitesh Kumari Yadav ... Sunil Dhankar
International Journal Of Engineering And Computer Science | VOL. 5
Hitesh Kumari Yadav, et. al.Hitesh Kumari Yadav ... Sunil Dhankar
29 Feb 2016
International Journal Of Engineering And Computer Science | VOL. 5

Geospatial data of freshwater habitats for macroecological studies: an example with freshwater fishes
Luis González Vilas ... Jorge M Lobo
International Journal of Geographical Information Science | VOL. 30
Luis González Vilas, et. al.Luis González Vilas ... Jorge M Lobo
29 Jul 2015
International Journal of Geographical Information Science | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HGDP and HapMap analysis by Ancestry Mapper reveals local and global population relationships.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one