Abstract

Maximum parsimony (MP) methods aim to reconstruct the phylogeny of extant species by finding the most parsimonious evolutionary scenario using the species' genome data. MP methods are considered to be accurate, but they are also computationally expensive especially for a large number of species. Several disk-covering methods (DCMs), which decompose the input species to multiple overlapping subgroups (or disks), have been proposed to solve the problem in a divide-and-conquer way.We design a new DCM based on the spectral method and also develop the COGNAC (Comparing Orders of Genes using Novel Algorithms and high-performance Computers) software package. COGNAC uses the new DCM to reduce the phylogenetic tree search space and selects an output tree from the reduced search space based on the MP principle. We test the new DCM using gene order data and inversion distance. The new DCM not only reduces the number of candidate tree topologies but also excludes erroneous tree topologies which can be selected by original MP methods. Initial labeling of internal genomes affects the accuracy of MP methods using gene order data, and the new DCM enables more accurate initial labeling as well. COGNAC demonstrates superior accuracy as a consequence. We compare COGNAC with FastME and the combination of the state of the art DCM (Rec-I-DCM3) and GRAPPA . COGNAC clearly outperforms FastME in accuracy. COGNAC –using the new DCM–also reconstructs a much more accurate tree in significantly shorter time than GRAPPA with Rec-I-DCM3.

Highlights

  • Maximum parsimony (MP) [1,2] methods enumerate candidate trees for the input species and select the most parsimonious tree as an output tree by processing the input species’ genome data

  • The number of false positives (FP), the number of false negatives (FN), and the execution time in a cell are the average of the finished computations out of 10 trials using 10 different model trees. h, m, and s in the tables are hours, minutes, and seconds, respectively. doi:10.1371/journal.pone.0022483.t001

  • The number of false positives (FP), the number of false negatives (FN), and the execution time in a cell are the average of the finished computations out of 10 trials using 10 different model trees. h, m, and s in the tables are hours, minutes, and seconds, respectively. doi:10.1371/journal.pone.0022483.t002

Read more

Summary

Introduction

Maximum parsimony (MP) [1,2] methods enumerate candidate trees for the input species and select the most parsimonious tree as an output tree by processing the input species’ genome data (such as nucleotide sequence data or gene order data). Even with an efficient branch and bounding strategy and for a relatively small number of species, MP methods need to evaluate a large number of candidate tree topologies. Ranking different tree topologies is much more expensive for gene order data than nucleotide sequence data. There is no known algorithm to find the most parsimonious labeling of the internal genomes in a tree to compute the tree’s parsimony score if the tree has more than three leaf genomes [3]. A large number of candidate trees is even more problematic for gene order data as a result

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call