Abstract
Maximum parsimony (MP) methods aim to reconstruct the phylogeny of extant species by finding the most parsimonious evolutionary scenario using the species' genome data. MP methods are considered to be accurate, but they are also computationally expensive especially for a large number of species. Several disk-covering methods (DCMs), which decompose the input species to multiple overlapping subgroups (or disks), have been proposed to solve the problem in a divide-and-conquer way.We design a new DCM based on the spectral method and also develop the COGNAC (Comparing Orders of Genes using Novel Algorithms and high-performance Computers) software package. COGNAC uses the new DCM to reduce the phylogenetic tree search space and selects an output tree from the reduced search space based on the MP principle. We test the new DCM using gene order data and inversion distance. The new DCM not only reduces the number of candidate tree topologies but also excludes erroneous tree topologies which can be selected by original MP methods. Initial labeling of internal genomes affects the accuracy of MP methods using gene order data, and the new DCM enables more accurate initial labeling as well. COGNAC demonstrates superior accuracy as a consequence. We compare COGNAC with FastME and the combination of the state of the art DCM (Rec-I-DCM3) and GRAPPA . COGNAC clearly outperforms FastME in accuracy. COGNAC –using the new DCM–also reconstructs a much more accurate tree in significantly shorter time than GRAPPA with Rec-I-DCM3.
Highlights
Maximum parsimony (MP) [1,2] methods enumerate candidate trees for the input species and select the most parsimonious tree as an output tree by processing the input species’ genome data
The number of false positives (FP), the number of false negatives (FN), and the execution time in a cell are the average of the finished computations out of 10 trials using 10 different model trees. h, m, and s in the tables are hours, minutes, and seconds, respectively. doi:10.1371/journal.pone.0022483.t001
The number of false positives (FP), the number of false negatives (FN), and the execution time in a cell are the average of the finished computations out of 10 trials using 10 different model trees. h, m, and s in the tables are hours, minutes, and seconds, respectively. doi:10.1371/journal.pone.0022483.t002
Summary
Maximum parsimony (MP) [1,2] methods enumerate candidate trees for the input species and select the most parsimonious tree as an output tree by processing the input species’ genome data (such as nucleotide sequence data or gene order data). Even with an efficient branch and bounding strategy and for a relatively small number of species, MP methods need to evaluate a large number of candidate tree topologies. Ranking different tree topologies is much more expensive for gene order data than nucleotide sequence data. There is no known algorithm to find the most parsimonious labeling of the internal genomes in a tree to compute the tree’s parsimony score if the tree has more than three leaf genomes [3]. A large number of candidate trees is even more problematic for gene order data as a result
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.