Abstract
The genetic diversity of humans, like many species, has been shaped by a complex pattern of population separations followed by isolation and subsequent admixture. This pattern, reaching at least as far back as the appearance of our species in the paleontological record, has left its traces in our genomes. Reconstructing a population’s history from these traces is a challenging problem. Here we present a novel approach based on the Multiple Sequentially Markovian Coalescent (MSMC) to analyze the separation history between populations. Our approach, called MSMC-IM, uses an improved implementation of the MSMC (MSMC2) to estimate coalescence rates within and across pairs of populations, and then fits a continuous Isolation-Migration model to these rates to obtain a time-dependent estimate of gene flow. We show, using simulations, that our method can identify complex demographic scenarios involving post-split admixture or archaic introgression. We apply MSMC-IM to whole genome sequences from 15 worldwide populations, tracking the process of human genetic diversification. We detect traces of extremely deep ancestry between some African populations, with around 1% of ancestry dating to divergences older than a million years ago.
Highlights
Genomes harbor rich information about population history, encoded in patterns of mutations and recombinations
We find evidence for remarkably deep population structure in some African population pairs, suggesting that deep ancestry dating to one million years ago and older is still present in human populations in small amounts today
The second aspect, which entails reconstructing the timing and dynamics of population separation requires a non-trivial choice of parameterization: While methods like diCal2 [5], as well as many methods based on the joint site frequency spectrum [8,9,10,11] use an explicit population model with split times, migration rates or admixture events, Multiple Sequentially Markovian Coalescent (MSMC) [4] introduced the concept of the relative cross coalescence rate to capture population separations in a continuously parameterized fashion
Summary
Genomes harbor rich information about population history, encoded in patterns of mutations and recombinations. One important innovation was the Sequentially Markovian Coalescent (SMC) model [1,2], which is an approximate form of the ancestral recombination graph that can be fitted as a Hidden Markov model along the sequence This approach has been used to infer demographic history in methods like PSMC [3], MSMC [4], diCal [5,6] and SMC++ [7]. The second aspect, which entails reconstructing the timing and dynamics of population separation requires a non-trivial choice of parameterization: While methods like diCal2 [5], as well as many methods based on the joint site frequency spectrum [8,9,10,11] use an explicit population model with split times, migration rates or admixture events, MSMC [4] introduced the concept of the relative cross coalescence rate to capture population separations in a continuously parameterized fashion. It is difficult to interpret the cross-coalescence rate in terms of actual historical events
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have