Abstract

The Multiple Sequentially Markovian Coalescent (MSMC) is a population genetic method and software for inferring demographic history and population structure through time from genome sequences. Here we describe the main program MSMC and its successor MSMC2. We go through all the necessary steps of processing genomic data from BAM files all the way to generating plots of inferred population size and separation histories. Some background on the methodology itself is provided, as well as bash scripts and python source code to run the necessary programs. The reader is also referred to community resources such as a mailing list and github repositories for further advice.

Highlights

  • Multiple Sequentially Markovian Coalescent (MSMC) [1] is an algorithm and program for analyzing genome sequence data to answer two basic questions: How did the effective population size of a population change through time? When and how did two populations separate from each other in the past? As input data, MSMC analyzes multiple phased genome sequences simultaneously to fit a demographic model to the data.MSMC models an approximate version of the coalescent under recombination across the input sequences

  • As introduced in Schiffels and Durbin [1], to simplify interpretation of the three inferred rates, we can plot a simple summary by taking the ratio of the across-rate and the mean within-rate, which is termed the relative cross coalescence rate (Fig. 3b)

  • A tutorial can be found at https://github.com/stschiff/msmctools/blob/master/msmc-tutorial/guide.md and general documentation can be found within each package

Read more

Summary

Introduction

MSMC [1] is an algorithm and program for analyzing genome sequence data to answer two basic questions: How did the effective population size of a population change through time? When and how did two populations separate from each other in the past? As input data, MSMC analyzes multiple phased genome sequences simultaneously (separated into haplotypes, i.e. maternal and paternal haploid chromosomes) to fit a demographic model to the data. MSMC [1] is an algorithm and program for analyzing genome sequence data to answer two basic questions: How did the effective population size of a population change through time? As introduced in Schiffels and Durbin [1], to simplify interpretation of the three inferred rates, we can plot a simple summary by taking the ratio of the across-rate and the mean within-rate, which is termed the relative cross coalescence rate (rCCR) (Fig. 3b) This summary variable ranges between 0 and 1, and indicates when and how the two populations diverged. MSMC is computationally intensive, and for all practical purposes limited to analyzing eight haplotypes at most Even within this scope, we see that coalescence rate estimates for more than four haplotypes are sometimes biased (see, for example, Fig. 2, red curve), with some systematic over- and underestimations of the true coalescence rates.

Software Overview
MSMC-Tools
Diploid Data
Phasing
High Coverage Data
Input Data Format
Generating VCF and Mask Files from Individual BAM Files
Combining Multiple Individuals into One Input File
Resource Requirements
Plotting Results and then use the combined file to proceed with plotting
Bootstrapping
Controlling Time
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call