Fast computation of distance estimators

Isaac Elias,Jens Lagergren

doi:10.1186/1471-2105-8-89

Abstract

BackgroundSome distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. The input to a distance method is a distance matrix, containing estimated pairwise distances between all pairs of taxa. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of n taxa in time O(n3). Unfortunately, the fastest practical algorithms known for Computing the distance matrix, from n sequences of length l, takes time proportional to l·n2. Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications.ResultsWe give an advanced algorithm for Computing the number of mutational events between DNA sequences which is significantly faster than both Phylip and Paup. Moreover, we give a new method for estimating pairwise distances between sequences which contain ambiguity Symbols. This new method is shown to be more accurate as well as faster than earlier methods.ConclusionOur novel algorithm for Computing distance estimators provides a valuable tool in phylogeny reconstruction. Since the running time of our distance estimation algorithm is comparable to that of most distance methods, the previous bottleneck is removed. All distance methods, such as NJ, require a distance matrix as input and, hence, our novel algorithm significantly improves the overall running time of all distance methods. In particular, we show for real world biological applications how the running time of phylogeny reconstruction using NJ is improved from a matter of hours to a matter of seconds.

Highlights

Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data
It is important to note that, since the computation of the distance matrix is a prerequisite for all distance methods, our algorithm provides an increase in speed for all distance methods, i.e., Neighbor Joining (NJ)
A novel algorithm for computing the number of mutational events In distance estimation, a specific model of sequence evolution is used to derive an estimate of the true mutational distance between two sequences from the number of observed mutational events

Summary

Introduction

Some distance methods are among the most commonly used methods for reconstructing phylogenetic trees from sequence data. Distance methods themselves are often fast, e.g., the famous and popular Neighbor Joining (NJ) algorithm reconstructs a phylogeny of n taxa in time O(n3). The fastest practical algorithms known for Computing the distance matrix, from n sequences of length l, takes time proportional to l·n2. Since the sequence length typically is much larger than the number of taxa, the distance estimation is the bottleneck in phylogeny reconstruction. This bottleneck is especially apparent in reconstruction of large phylogenies or in applications where many trees have to be reconstructed, e.g., bootstrapping and genome wide applications. Comparative genomics studies (page number not for citation purposes)

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 13, 2007
Citations: 26	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Fast computation of distance estimators

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Distance-Based Phylogeny Reconstruction (Optimal Radius)
Richard Desper ... Olivier Gascuel
-
Richard Desper, et. al.Richard Desper ... Olivier Gascuel
01 Jan 2008
01 Jan 2008

A signal-to-noise analysis of phylogeny estimation by neighbor-joining: Insufficiency of polynomial length sequences
Michelle R Lacey ... Joseph T Chang
Mathematical Biosciences | VOL. 199
Michelle R Lacey, et. al.Michelle R Lacey ... Joseph T Chang
18 Jan 2006
Mathematical Biosciences | VOL. 199

A note on Sattath and Tversky's, Saitou and Nei's, and Studier and Keppler's algorithms for inferring phylogenies from evolutionary distances.

Molecular biology and evolution | VOL. 11

01 Nov 1994
Molecular biology and evolution | VOL. 11

Accuracy Guarantees for Phylogeny Reconstruction Algorithms Based on Balanced Minimum Evolution
Magnus Bordewich ... Radu Mihaescu
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 10
Magnus Bordewich, et. al.Magnus Bordewich ... Radu Mihaescu
01 May 2013
IEEE/ACM Transactions on Computational Biology and Bioinformatics | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast computation of distance estimators

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics