Abstract

We introduce ABLE (Approximate Blockwise Likelihood Estimation), a novel simulation-based composite likelihood method that uses the blockwise site frequency spectrum to jointly infer past demography and recombination. ABLE is explicitly designed for a wide variety of data from unphased diploid genomes to genome-wide multi-locus data (for example, RADSeq) and can also accommodate arbitrarily large samples. We use simulations to demonstrate the accuracy of this method to infer complex histories of divergence and gene flow and reanalyze whole genome data from two species of orangutan. ABLE is available for download at https://github.com/champost/ABLE.

Highlights

  • Demographic history has played a major role in shaping genetic variation

  • Following [22], the blockwise SFS (bSFS) is essentially a frequency spectrum of site frequency spectrum types across blocks and can be thought of as a straightforward extension of the SFS that accounts for linkage over a fixed length of sequence block (Fig. 1a)

  • The bSFS readily extends to samples from multiple populations where the entries of k are counts of mutation types defined by the joint SFS [6]

Read more

Summary

Introduction

Demographic history has played a major role in shaping genetic variation. Using this information in an efficient way to infer even very simple models of population history remains challenging: a complete description of the history of genomic samples includes both the ancestral process of coalescence and recombination, as captured by the ancestral recombination graph (ARG). While the ARG is straightforward to simulate, in practice, the number of recombination and coalescent events in any stretch of genome generally exceeds the information (i.e. number of mutations) available to reconstruct them. It is currently not feasible to perform demographic inference by integrating over all realizations of the ARG that are compatible with a genomic dataset [1]. Current methods dealing with genomic data tackle this problem by making simplifying assumptions about recombination [2]. Methods based on single nucleotide polymorphisms (SNPs) ignore linkage information altogether and make use of the site frequency spectrum (SFS) [3, 4], which is a function only of the expected length of

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call