Abstract

BackgroundFirst proposed by Cavender and Felsenstein, and Lake, invariant based algorithms for phylogenetic reconstruction were widely dismissed by practicing biologists because invariants were perceived to have limited accuracy in constructing trees based on DNA sequences of reasonable length. Recent developments by algebraic geometers have led to the construction of lists of invariants which have been demonstrated to be more accurate on small sequences, but were limited in that they could only be used for trees with small numbers of taxa. We have developed and tested an invariant based quartet puzzling algorithm which is accurate and efficient for biologically reasonable data sets.ResultsWe found that our algorithm outperforms Maximum Likelihood based quartet puzzling on data sets simulated with low to medium evolutionary rates. For faster rates of evolution, invariant based quartet puzzling is reasonable but less effective than maximum likelihood based puzzling.ConclusionsThis is a proof of concept algorithm which is not intended to replace existing reconstruction algorithms. Rather, the conclusion is that when seeking solutions to a new wave of phylogenetic problems (super tree algorithms, gene vs. species tree, mixture models), invariant based methods should be considered. This article demonstrates that invariants are a practical, reasonable and flexible source for reconstruction techniques.

Highlights

  • First proposed by Cavender and Felsenstein, and Lake, invariant based algorithms for phylogenetic reconstruction were widely dismissed by practicing biologists because invariants were perceived to have limited accuracy in constructing trees based on DNA sequences of reasonable length

  • Invariant based quartet puzzling To address the scalability issue, we propose a variation of quartet puzzling which uses invariants to compute the individual quartets, allowing the application of invariants to data sets of arbitrary size

  • Comparison with maximum likelihood based quartet puzzling The results of our simulation study are recorded in the four tables below

Read more

Summary

Introduction

First proposed by Cavender and Felsenstein, and Lake, invariant based algorithms for phylogenetic reconstruction were widely dismissed by practicing biologists because invariants were perceived to have limited accuracy in constructing trees based on DNA sequences of reasonable length. Recent developments by algebraic geometers have led to the construction of lists of invariants which have been demonstrated to be more accurate on small sequences, but were limited in that they could only be used for trees with small numbers of taxa. The majority of existing algorithms for phylogenetic reconstruction fall into one of three classes: distance based algorithms, parsimony algorithms, and maximum likelihood based algorithms. These classes of algorithms justifiably form the pillars of phylogenetic reconstruction, but they are each known to have shortcomings. Maximum likelihood algorithms are typically slow and suffer from long branch attraction

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call