Abstract

Genome-scale diversity data are increasingly available in a variety of biological systems, and can be used to reconstruct the past evolutionary history of species divergence. However, extracting the full demographic information from these data is not trivial, and requires inferential methods that account for the diversity of coalescent histories throughout the genome. Here, we evaluate the potential and limitations of one such approach. We reexamine a well-known system of mussel sister species, using the joint site frequency spectrum (jSFS) of synonymous mutations computed either from exome capture or RNA-seq, in an Approximate Bayesian Computation (ABC) framework. We first assess the best sampling strategy (number of: individuals, loci, and bins in the jSFS), and show that model selection is robust to variation in the number of individuals and loci. In contrast, different binning choices when summarizing the jSFS, strongly affect the results: including classes of low and high frequency shared polymorphisms can more effectively reveal recent migration events. We then take advantage of the flexibility of ABC to compare more realistic models of speciation, including variation in migration rates through time (i.e., periodic connectivity) and across genes (i.e., genome-wide heterogeneity in migration rates). We show that these models were consistently selected as the most probable, suggesting that mussels have experienced a complex history of gene flow during divergence and that the species boundary is semi-permeable. Our work provides a comprehensive evaluation of ABC demographic inference in mussels based on the coding jSFS, and supplies guidelines for employing different sequencing techniques and sampling strategies. We emphasize, perhaps surprisingly, that inferences are less limited by the volume of data, than by the way in which they are analyzed.

Highlights

  • The biodiversity we inherited from the Quaternary was shaped by the process of species formation (Hewitt, 2000)

  • The usage of coding sequences to infer divergence histories of closely-related species is justified for several reasons: (i) synonymous mutations are less affected by direct selection than other categories of mutation, and selection affects chromosomal regions larger than genes themselves, including non-coding regions; (ii) we implicitly modeled the effects of selection against migrant genes by including heterogeneous effective migration rates across the genome

  • We have shown that two high-throughput sequencing datasets (“exome capture” and “rna-seq”), imply the same history of divergence in mussels, regardless of the number of individuals or SNPs sampled, but conditional on the inclusion of informative classes in the joint site frequency spectrum (jSFS)

Read more

Summary

Introduction

The biodiversity we inherited from the Quaternary was shaped by the process of species formation (Hewitt, 2000). Model-based inferences from genetic data have been used to investigate the history of gene flow (Beaumont et al, 2010). Special attention has been paid to the distinction between recent divergence in a strict isolation (SI) model, and older divergence with continuous migration (Nielsen & Wakeley, 2001), more complex scenarios are possible (Marino et al, 2013, Sousa & Hey, 2013). With next-generation sequencing technologies, thousands of SNPs throughout the genome can be used to infer the demographic histories of non-model species pairs (Sousa & Hey, 2013). A recent and fast maximum-likelihood method based on the jSFS (Gutenkunst et al, 2009) has proven useful for distinguishing continuous migration from SI (e.g., in ragworts, Chapman, Hiscock & Filatov, 2013, and beach mice, Domingues et al, 2012). As a consequence simulations need to be conducted to evaluate competing models, and the computational speed advantage is lost

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call