Fast Dating Using Least-Squares Criteria and Algorithms.

Thu-Hien To,Samantha Lycett,Olivier Gascuel,Matthieu Jung

doi:10.1093/sysbio/syv068

Abstract

Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley–Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley–Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3.

Highlights

The explosion of genetic data and progress in phylogenetic reconstruction algorithms has resulted in increasing utility and popularity of phylogenetic analyses
We study a model analogous to Langley and Fitch (LF)’s, but using a normal approximation that allows for a least-squares approach, and show that this model is robust to uncorrelated violations of the molecular clock
These algorithms are based on a Gaussian noise, least-squares model, simplifying the Langley and Fitch’s (1974) Poisson model implemented in the r8s package (Sanderson 2003)

Summary

Introduction

The explosion of genetic data and progress in phylogenetic reconstruction algorithms has resulted in increasing utility and popularity of phylogenetic analyses. Some programs (e.g., PAML, Rannala and Yang 2007) perform calculations on a fixed, user-supplied tree, while others (e.g., BEAST, Drummond and Rambaut 2007; Drummond et al 2012) infer the tree from the sequence alignment. These programs typically contain several submodels, which describe the substitution process (e.g., GTR, distribution of rates across sites, etc.), the tree (e.g., coalescent, constant or varying population size, birth–death, etc.), priors on the parameter values and, most importantly regarding dating, the molecular clock.

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Systematic Biology	Publication Date: Sep 30, 2015
Citations: 399	License type: cc-by

R Discovery Prime

R Discovery Prime

Fast Dating Using Least-Squares Criteria and Algorithms.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Systematic Biology

Lead the way for us

Similar Papers

Fast and accurate supertrees: towards large scale phylogenies

-

01 Jan 2018
01 Jan 2018

Editor's evaluation: Robust and Efficient Assessment of Potency (REAP) as a quantitative tool for dose-response curve estimation
Philip Boonstra
-
Philip BoonstraPhilip Boonstra
09 May 2022
09 May 2022

Robinson-Foulds supertrees.
Mukul S Bansal ... Oliver Eulenstein
Algorithms for Molecular Biology | VOL. 5
Mukul S Bansal, et. al.Mukul S Bansal ... Oliver Eulenstein
24 Feb 2010
Algorithms for Molecular Biology | VOL. 5

Selecting models of nucleotide substitution: an application to human immunodeficiency virus 1 (HIV-1).
David Posada ... Keith A Crandall
Molecular Biology and Evolution | VOL. 18
David Posada, et. al.David Posada ... Keith A Crandall
01 Jun 2001
Molecular Biology and Evolution | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast Dating Using Least-Squares Criteria and Algorithms.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Systematic Biology