Abstract

BackgroundIn recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Explicit modeling of substitution rates and divergence times makes formulation of informative prior distributions for branch lengths possible. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, however, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies.ResultsWe demonstrate that a hill-climbing maximum a posteriori (MAP) adaptation of the MCMC scheme results in considerable gain in computational efficiency. We demonstrate also that a novel dynamic programming (DP) algorithm for branch length factorization, useful both in the hill-climbing and in the MCMC setting, further reduces computation time. For the problem of inferring rates and times parameters on a fixed tree, we perform simulations, comparisons between hill-climbing and MCMC on a plant rbcL gene dataset, and dating analysis on an animal mtDNA dataset, showing that our methodology enables efficient, highly accurate analysis of very large trees. Datasets requiring a computation time of several days with MCMC can with our MAP algorithm be accurately analysed in less than a minute. From the results of our example analyses, we conclude that our methodology generally avoids getting trapped early in local optima. For the cases where this nevertheless can be a problem, for instance when we in addition to the parameters also infer the tree topology, we show that the problem can be evaded by using a simulated-annealing like (SAL) method in which we favour tree swaps early in the inference while biasing our focus towards rate and time parameter changes later on.ConclusionOur contribution leaves the field open for fast and accurate dating analysis of nucleotide sequence data. Modeling branch substitutions rates and divergence times separately allows us to include birth-death priors on the times without the assumption of a molecular clock. The methodology is easily adapted to take data from fossil records into account and it can be used together with a broad range of rate and substitution models.

Highlights

  • In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events

  • maximum likelihood (ML) phylogenetic inference is generally quicker than stochastic optimization inference (MCMC), the time complexity of ML-algorithms has been prohibitive for analysis of large phylogenies

  • We further introduce a dynamic programming (DP) algorithm for optimal factorization of branch lengths into rates and times, thereby considerably reducing the computation time needed for hill-climbing as well as Markov chain Monte Carlo (MCMC)-algorithms

Read more

Summary

Introduction

In recent years there has been a trend of leaving the strict molecular clock in order to infer dating of speciations and other evolutionary events. Models with birth-death priors on tree branching and auto-correlated or iid substitution rates among lineages have been proposed, enabling simultaneous inference of substitution rates and divergence times. This problem has, mainly been analysed in the Markov chain Monte Carlo (MCMC) framework, an approach requiring computation times of hours or days when applied to large phylogenies. The field has seen considerable advances in speed of the best methods, among which PHYML [2] and RAxML [3] are notable, seemingly without sacrificing much accuracy. Among these are models with molecular clocks operating locally [9,10] and a range of models with rate variation over lineages; these include auto-correlated models, i.e., the rate distribution for a particular branch depends on the rate value of the parent branch [11,12,13,14,15], and uncorrelated models where rates are drawn independently from a common underlying distribution [16] (Sennblad et al.: Parental guidance vs. mutual independence – evaluation of bayesian models of substitution rate evolution, submitted)

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.