Abstract

Multivariate statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been widely used to summarize the structure of human genetic variation, often in easily visualized two-dimensional maps. Many recent studies have reported similarity between geographic maps of population locations and MDS or PCA maps of genetic variation inferred from single-nucleotide polymorphisms (SNPs). However, this similarity has been evident primarily in a qualitative sense; and, because different multivariate techniques and marker sets have been used in different studies, it has not been possible to formally compare genetic variation datasets in terms of their levels of similarity with geography. In this study, using genome-wide SNP data from 128 populations worldwide, we perform a systematic analysis to quantitatively evaluate the similarity of genes and geography in different geographic regions. For each of a series of regions, we apply a Procrustes analysis approach to find an optimal transformation that maximizes the similarity between PCA maps of genetic variation and geographic maps of population locations. We consider examples in Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, as well as in a worldwide sample, finding that significant similarity between genes and geography exists in general at different geographic levels. The similarity is highest in our examples for Asia and, once highly distinctive populations have been removed, Sub-Saharan Africa. Our results provide a quantitative assessment of the geographic structure of human genetic variation worldwide, supporting the view that geography plays a strong role in giving rise to human population structure.

Highlights

  • The geographic structure of human genetic variation has long been of interest for its implications for studying human evolutionary history [1,2,3,4,5]

  • We combine genome-wide single-nucleotide polymorphisms (SNPs) data from more than 100 populations worldwide to perform a formal comparison between genes and geography in different regions

  • By examining a worldwide sample and samples from Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, we find that significant similarity between genes and geography exists in general in different geographic regions and at different geographic levels

Read more

Summary

Introduction

The geographic structure of human genetic variation has long been of interest for its implications for studying human evolutionary history [1,2,3,4,5]. The expansion of population-genetic datasets has contributed to an increase in geographic investigations of human genetic variation, often on the basis of classic multivariate statistical techniques such as PCA and MDS [6,7,8,9,10]. The population structure of genetic variation is often summarized in visualized two-dimensional statistical maps obtained from the first two components of PCA or MDS. For large-scale single-nucleotide polymorphism (SNP) data, PCA and MDS are popular because of their computational efficiency and high level of resolution in decomposing the complex structure of human genetic variation [12,14]. Results produced by PCA and MDS are very similar to each other [15]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.