Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations

Sridevi Padakanti,Chen-Hsiang Yeang,Yan-Bin Chen,Khong-Loon Tiong

doi:10.1038/s41598-021-97129-2

Sridevi Padakanti, Chen-Hsiang Yeang + Show 2 more

Open Access

https://doi.org/10.1038/s41598-021-97129-2

Copy DOI

Abstract

Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to identify necessary informative loci whose removal from the data deteriorates the PCA structure. Unlike classical AIMs, necessary informative loci densely cover the genome, hence can illuminate the evolution and mixing history of populations. We conduct a comprehensive analysis to the genotype data of the 1000 Genomes Project using necessary informative loci. Projections along the top seven principal components demarcate populations at distinct geographic levels. Millions of necessary informative loci along each PC are identified. Population identities along each PC are approximately determined by weighted sums of minor (or major) alleles over the informative loci. Variations of allele frequencies are aligned with the history and direction of population evolution. The population distribution of projections along the top three PCs is recapitulated by a simple demographic model based on several waves of founder population separation and mixing. Informative loci possess locational concentration in the genome and functional enrichment. Genes at two hot spots encompassing dense PC 7 informative loci exhibit differential expressions among European populations. The mosaic of local ancestry in the genome of a mixed descendant from multiple populations can be inferred from partial PCA projections of informative loci. Finally, informative loci derived from the 1000 Genomes data well predict the projections of an independent genotype data of South Asians. These results demonstrate the utility and relevance of informative loci to investigate human evolution.

Highlights

Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations
We introduce a new concept of necessary informative loci based on PCA projections of genotype data across populations
PCA projections of the data restricted to ancestry informative markers (AIMs) approximate those of the complete data

Summary

Introduction

Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. In contrast to PCA where population differences in their projections are distributed in millions of SNPs, AIM studies typically identify a small number of SNPs sufficient to delineate these populations with high accuracy. This apparent paradox is due to strong correlations of many SNPs undergoing linkage disequilibrium (LD). The latter calculates the weights of SNPs to principal components in terms of their loadings (coefficients) in the corresponding Singular Value Decomposition (SVD), and randomly samples a few AIMs with probabilities proportional to the weights These approaches successfully incorporate PCA information to identify AIMs, they are still aimed to find a few markers sufficient to approximate the PCA structure of the complete genotype data. They reveal the recombination history of individuals or populations and are included in the necessary informative loci

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific reports	Publication Date: Sep 7, 2021
Citations: 1	License type: open-access

R Discovery Prime

R Discovery Prime

Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific reports

Lead the way for us

Similar Papers

Measuring European Population Stratification with Microarray Genotype Data
Marc Bauchet ... Mark D Shriver
The American Journal of Human Genetics | VOL. 80
Marc Bauchet, et. al.Marc Bauchet ... Mark D Shriver
01 May 2007
The American Journal of Human Genetics | VOL. 80

Optimal selection of genetic variants for adjustment of population stratification in European association studies.
Regina Brinster ... Justo Lorenzo Bermejo
Briefings in Bioinformatics | VOL. 21
Regina Brinster, et. al.Regina Brinster ... Justo Lorenzo Bermejo
13 Mar 2019
Briefings in Bioinformatics | VOL. 21

Application of Ancestry Informative Markers to Association Studies in European Americans
Michael F Seldin ... Alkes L Price
PLoS Genetics | VOL. 4
Michael F Seldin, et. al.Michael F Seldin ... Alkes L Price
01 Jan 2008
PLoS Genetics | VOL. 4

Empirical testing of a 23-AIMs panel of SNPs for ancestry evaluations in four major US populations
Xiangpei Zeng ... Jennifer D Churchill
International Journal of Legal Medicine | VOL. 130
Xiangpei Zeng, et. al.Xiangpei Zeng ... Jennifer D Churchill
25 Feb 2016
International Journal of Legal Medicine | VOL. 130

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific reports