MSMC and MSMC2: The Multiple Sequentially Markovian Coalescent.

Stephan Schiffels,Ke Wang

doi:10.1007/978-1-0716-0199-0_7

Abstract

The Multiple Sequentially Markovian Coalescent (MSMC) is a population genetic method and software for inferring demographic history and population structure through time from genome sequences. Here we describe the main program MSMC and its successor MSMC2. We go through all the necessary steps of processing genomic data from BAM files all the way to generating plots of inferred population size and separation histories. Some background on the methodology itself is provided, as well as bash scripts and python source code to run the necessary programs. The reader is also referred to community resources such as a mailing list and github repositories for further advice.

Highlights

Multiple Sequentially Markovian Coalescent (MSMC) [1] is an algorithm and program for analyzing genome sequence data to answer two basic questions: How did the effective population size of a population change through time? When and how did two populations separate from each other in the past? As input data, MSMC analyzes multiple phased genome sequences simultaneously to fit a demographic model to the data.MSMC models an approximate version of the coalescent under recombination across the input sequences
As introduced in Schiffels and Durbin [1], to simplify interpretation of the three inferred rates, we can plot a simple summary by taking the ratio of the across-rate and the mean within-rate, which is termed the relative cross coalescence rate (Fig. 3b)
A tutorial can be found at https://github.com/stschiff/msmctools/blob/master/msmc-tutorial/guide.md and general documentation can be found within each package

Summary

Introduction

MSMC [1] is an algorithm and program for analyzing genome sequence data to answer two basic questions: How did the effective population size of a population change through time? When and how did two populations separate from each other in the past? As input data, MSMC analyzes multiple phased genome sequences simultaneously (separated into haplotypes, i.e. maternal and paternal haploid chromosomes) to fit a demographic model to the data. MSMC [1] is an algorithm and program for analyzing genome sequence data to answer two basic questions: How did the effective population size of a population change through time? As introduced in Schiffels and Durbin [1], to simplify interpretation of the three inferred rates, we can plot a simple summary by taking the ratio of the across-rate and the mean within-rate, which is termed the relative cross coalescence rate (rCCR) (Fig. 3b) This summary variable ranges between 0 and 1, and indicates when and how the two populations diverged. MSMC is computationally intensive, and for all practical purposes limited to analyzing eight haplotypes at most Even within this scope, we see that coalescence rate estimates for more than four haplotypes are sometimes biased (see, for example, Fig. 2, red curve), with some systematic over- and underestimations of the true coalescence rates.

Software Overview

MSMC-Tools

Diploid Data

Phasing

High Coverage Data

Input Data Format

Generating VCF and Mask Files from Individual BAM Files

Combining Multiple Individuals into One Input File

Resource Requirements

Plotting Results and then use the combined file to proceed with plotting

Bootstrapping

Controlling Time

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Methods in molecular biology (Clifton, N.J.)	Publication Date: Jan 1, 2020
Citations: 133	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

MSMC and MSMC2: The Multiple Sequentially Markovian Coalescent.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Methods in molecular biology (Clifton, N.J.)

Lead the way for us

Similar Papers

Patterns of differentiation in the life history and demography of four recently described species of the Brachionus calyciflorus cryptic species complex
Wei Zhang ... Kimberley D Lemmen
Freshwater Biology | VOL. 64
Wei Zhang, et. al.Wei Zhang ... Kimberley D Lemmen
13 Aug 2019
Freshwater Biology | VOL. 64

The European population since 1945

Choice Reviews Online | VOL. 43

01 Jan 2006
The European population since 1945

Demographic population structure of black howler monkeys in fragmented and continuous forest in Chiapas, Mexico: Implications for conservation.
Keren Klass ... Sarie Van Belle
American Journal of Primatology | VOL. 82
Keren Klass, et. al.Keren Klass ... Sarie Van Belle
30 Jun 2020
American Journal of Primatology | VOL. 82

Assessment and mapping of demographic potential of urbanized territories of the Baikal-Mongol region
N V Vorobyev ... A N Vorobyev
IOP Conference Series: Earth and Environmental Science | VOL. 885
N V Vorobyev, et. al.N V Vorobyev ... A N Vorobyev
01 Oct 2021
IOP Conference Series: Earth and Environmental Science | VOL. 885

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MSMC and MSMC2: The Multiple Sequentially Markovian Coalescent.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Methods in molecular biology (Clifton, N.J.)