Abstract

We develop a Bayesian method for inferring the species phylogeny under the multispecies coalescent (MSC) model. To improve the mixing properties of the Markov chain Monte Carlo (MCMC) algorithm that traverses the space of species trees, we implement two efficient MCMC proposals: the first is based on the Subtree Pruning and Regrafting (SPR) algorithm and the second is based on a node-slider algorithm. Like the Nearest-Neighbor Interchange (NNI) algorithm we implemented previously, both new algorithms propose changes to the species tree, while simultaneously altering the gene trees at multiple genetic loci to automatically avoid conflicts with the newly proposed species tree. The method integrates over gene trees, naturally taking account of the uncertainty of gene tree topology and branch lengths given the sequence data. A simulation study was performed to examine the statistical properties of the new method. The method was found to show excellent statistical performance, inferring the correct species tree with near certainty when 10 loci were included in the dataset. The prior on species trees has some impact, particularly for small numbers of loci. We analyzed several previously published datasets (both real and simulated) for rattlesnakes and Philippine shrews, in comparison with alternative methods. The results suggest that the Bayesian coalescent-based method is statistically more efficient than heuristic methods based on summary statistics, and that our implementation is computationally more efficient than alternative full-likelihood methods under the MSC. Parameter estimates for the rattlesnake data suggest drastically different evolutionary dynamics between the nuclear and mitochondrial loci, even though they support largely consistent species trees. We discuss the different challenges facing the marginal likelihood calculation and transmodel MCMC as alternative strategies for estimating posterior probabilities for species trees. [Bayes factor; Bayesian inference; MCMC; multispecies coalescent; nodeslider; species tree; SPR.]

Highlights

  • Multilocus genetic sequence data have gained importance in inferring species trees in recent years and several inference methods have been proposed for this purpose

  • For species tree inference the model with the highest posterior probability is presented as the maximum a posteriori (MAP) model

  • If the model parameters (θ s and τs) are of primary interest we suggest that one should run the program a second time with the species tree fixed to the MAP tree or concensus tree

Read more

Summary

Introduction

Multilocus genetic sequence data have gained importance in inferring species trees in recent years and several inference methods have been proposed for this purpose. As noted by Maddison (1997) several processes can cause the species tree to differ from gene trees underlying particular loci. A simple widely-used method for multilocus species tree inference concatenates sequences from different loci, assuming that a single tree (treated as the species tree) underlies all the loci (reviewed in Rannala and Yang, 2008; Edwards, 2009). This approach can lead to strongly supported incorrect phylogenetic trees when incomplete lineage sorting occurs (see e.g., Leacheand Rannala, 2011), and has been shown to be inconsistent (Kubatko and Degnan, 2007). Use of the most frequent gene tree among loci as the species tree estimate can be inconsistent in the so-called ’anomaly zone’ (Degnan and Salter, 2005; Degnan and Rosenberg, 2006)

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call