Abstract
The standard approach to phylogeny estimation uses two phases, in which the first phase produces an alignment on a set of homologous sequences, and the second phase estimates a tree on the multiple sequence alignment. POY, a method which seeks a tree/alignment pair minimizing the total treelength, is the most widely used alternative to this two-phase approach. The topological accuracy of trees computed under treelength optimization is, however, controversial. In particular, one study showed that treelength optimization using simple gap penalties produced poor trees and alignments, and suggested the possibility that if POY were used with an affine gap penalty, it might be able to be competitive with the best two-phase methods. In this paper we report on a study addressing this possibility. We present a new heuristic for treelength, called BeeTLe (Better Treelength), that is guaranteed to produce trees at least as short as POY. We then use this heuristic to analyze a large number of simulated and biological datasets, and compare the resultant trees and alignments to those produced using POY and also maximum likelihood (ML) and maximum parsimony (MP) trees computed on a number of alignments. In general, we find that trees produced by BeeTLe are shorter and more topologically accurate than POY trees, but that neither POY nor BeeTLe produces trees as topologically accurate as ML trees produced on standard alignments. These findings, taken as a whole, suggest that treelength optimization is not as good an approach to phylogenetic tree estimation as maximum likelihood based upon good alignment methods.
Highlights
Most phylogenies are estimated in two steps: first, a multiple sequence alignment is produced, and a tree is estimated on the multiple alignment
We report on a study comparing BeeTLe used with three treelength criteria (Affine and two treelength criteria that are based upon simple gap penalty treatments) to POY, two-phase methods, and SATe, a method for co-estimating alignments and trees
We examined the question of whether optimizing treelength can return trees that are competitive with the better two-phase methods with respect to topological accuracy
Summary
Most phylogenies are estimated in two steps: first, a multiple sequence alignment is produced, and a tree is estimated on the multiple alignment. We developed a very simple heuristic, BeeTLe (Better TreeLength), that has the following structure: BeeTLe runs a collection of methods, including POY, to produce a set of trees on a given input set of unaligned sequences, uses POY to compute the treelength of each tree, and returns the tree that had the shortest treelength. Several algorithms have been developed for the GSP problem (both for the fixed tree and general case), POY is the standard method used to produce trees from unaligned sequences via treelength optimization. The Affine treelength criterion studied in Liu et al [31], which produced more accurate trees than Simple-1 or Simple-2, sets the cost of a gap of length L to 4zL
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.