Abstract

Due to its speed, the distance approach remains the best hope for building phylogenies on very large sets of taxa. Recently (R. Desper and O. Gascuel, J. Comp. Biol. 9:687-705, 2002), we introduced a new "balanced" minimum evolution (BME) principle, based on a branch length estimation scheme of Y. Pauplin (J. Mol. Evol. 51:41-47, 2000). Initial simulations suggested that FASTME, our program implementing the BME principle, was more accurate than or equivalent to all other distance methods we tested, with running time significantly faster than Neighbor-Joining (NJ). This article further explores the properties of the BME principle, and it explains and illustrates its impressive topological accuracy. We prove that the BME principle is a special case of the weighted least-squares approach, with biologically meaningful variances of the distance estimates. We show that the BME principle is statistically consistent. We demonstrate that FASTME only produces trees with positive branch lengths, a feature that separates this approach from NJ (and related methods) that may produce trees with branches with biologically meaningless negative lengths. Finally, we consider a large simulated data set, with 5,000 100-taxon trees generated by the Aldous beta-splitting distribution encompassing a range of distributions from Yule-Harding to uniform, and using a covarion-like model of sequence evolution. FASTME produces trees faster than NJ, and much faster than WEIGHBOR and the weighted least-squares implementation of PAUP*. Moreover, FASTME trees are consistently more accurate at all settings, ranging from Yule-Harding to uniform distributions, and all ranges of maximum pairwise divergence and departure from molecular clock. Interestingly, the covarion parameter has little effect on the tree quality for any of the algorithms. FASTME is freely available on the web.

Highlights

  • Distance-based methods for phylogeny reconstruction represent the best hope for accurately building phylogenies on very large sets of taxa

  • Initial simulations on a 2,000-tree data set suggested that our program, FASTME, was at least as accurate as the Fitch-Margoliash approach to tree fitting, and we proved that FASTME uses an algorithm whose running time was significantly better than NeighborJoining (NJ)

  • Simulation Results In Desper and Gascuel (2002) we considered simulated data with trees generated by a Yule-Harding (Yule 1925; Harding 1971) process, with random variation of branch lengths and random perturbation from a molecular clock

Read more

Summary

Introduction

Distance-based methods for phylogeny reconstruction represent the best hope for accurately building phylogenies on very large sets of taxa. We demonstrate that the balanced minimum evolution branch lengths represent, a special type of weighted leastsquares tree fitting, where the variances for each leaf-toleaf distance estimate are assumed to be exponentially related to the topological distance in the tree between the pair of leaves. The second part of Theorem 1 indicates that FASTME should be close to FITCH and PAUP*, which use equation 1 to define the tree length, and, to a lesser extent, to WEIGHBOR, which is based on a different treebuilding strategy but is a WLS approach. In other words, assuming that the model used to estimate the pairwise distance matrix is satisfied, the more data we have, the higher the probability to recover the correct tree This property is essential and has been discussed at length in the past (e.g., Felsenstein 1978).

Simulation Results
Results
Discussion
Literature Cited
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call