Abstract

Rapidly developing sequencing technologies and declining costs have made it possible to collect genome-scale data from population-level samples in nonmodel systems. Inferential tools for historical demography given these data sets are, at present, underdeveloped. In particular, approximate Bayesian computation (ABC) has yet to be widely embraced by researchers generating these data. Here, we demonstrate the promise of ABC for analysis of the large data sets that are now attainable from nonmodel taxa through current genomic sequencing technologies. We develop and test an ABC framework for model selection and parameter estimation, given histories of three-population divergence with admixture. We then explore different sampling regimes to illustrate how sampling more loci, longer loci or more individuals affects the quality of model selection and parameter estimation in this ABC framework. Our results show that inferences improved substantially with increases in the number and/or length of sequenced loci, while less benefit was gained by sampling large numbers of individuals. Optimal sampling strategies given our inferential models included at least 2000 loci, each approximately 2 kb in length, sampled from five diploid individuals per population, although specific strategies are model and question dependent. We tested our ABC approach through simulation-based cross-validations and illustrate its application using previously analysed data from the oak gall wasp, Biorhiza pallida.

Highlights

  • Approximate Bayesian computation (ABC) has enjoyed increasing popularity as a method for model comparison and parameter estimation in population genetics since its introduction by Tavare et al (1997)

  • We investigate the utility of ABC for demographic inference from population genomic data, using simulation-based validations to examine the influence of sampling attributes of the data set on model selection and parameter estimation

  • We further examined the influence of the number of summary statistics used by testing the performance of ABC-based model selection and parameter estimation using only the distribution means and not the higher moments, for two of the simulated sampling schemes

Read more

Summary

Introduction

Approximate Bayesian computation (ABC) has enjoyed increasing popularity as a method for model comparison and parameter estimation in population genetics since its introduction by Tavare et al (1997). Major challenges in ABC include the selection of sufficient summary statistics (which may not be available for the parameters or models considered; Csillery et al 2010; Aeschbacher et al 2012) and the high computational cost of simulating the model-specific data to which observed values are compared. This cost is significant for genome-scale data (Sousa & Hey 2013), which are highly attractive for demographic inference because relevant parameters are best estimated from samples of many genes (Felsenstein 2006; Li & Jakobsson 2012). Declining sequencing costs (Pool et al 2010) and development of individual barcoding methods that allow population-level sampling (Baird et al 2008; Peterson et al 2012) increase the feasibility of genome-level sampling of nonmodel taxa

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call