Abstract
We study two problems in computational phylogenetics. The first is tree compatibility. The input is a collection of phylogenetic trees over different partially-overlapping sets of species. The goal is to find a single phylogenetic tree that displays all the evolutionary relationships implied by . The second problem is incomplete directed perfect phylogeny (IDPP). The input is a data matrix describing a collection of species by a set of characters, where some of the information is missing. The question is whether there exists a way to fill in the missing information so that the resulting matrix can be explained by a phylogenetic tree satisfying certain conditions. We explain the connection between tree compatibility and IDPP and show that a recent tree compatibility algorithm is effectively a generalization of an earlier IDPP algorithm. Both algorithms rely heavily on maintaining the connected components of a graph under a sequence of edge and vertex deletions, for which they use the dynamic connectivity data structure of Holm et al., known as HDT. We present a computational study of algorithms for tree compatibility and IDPP. We show experimentally that substituting HDT by a much simpler data structure—essentially, a single-level version of HDT—improves the performance of both of these algorithm in practice. We give partial empirical and theoretical justifications for this observation.
Highlights
A phylogenetic tree is a graphical depiction of the evolutionary history of a collection of taxa
The problem is to find a tree T whose taxon set is the union of the taxon sets of the input trees, such that each input tree Ti can be obtained from the restriction of T
BuildNT is closely related to Semple and Steel’s version of B UILD [2]
Summary
A phylogenetic tree is a graphical depiction of the evolutionary history of a collection of taxa (typically species or genes). The problem is to find a tree T whose taxon set is the union of the taxon sets of the input trees, such that each input tree Ti can be obtained from the restriction of T to the leaf set of Ti through edge contraction. If such a tree T exists, P is said to be compatible; otherwise, P is incompatible. Since a profile of rooted trees is effectively a collection of unrooted trees that have a common root taxon, the preceding observation establishes the connection between rooted tree compatibility and IDPP. Our empirical results show that, in this setting, simple data structures perform better than more sophisticated ones with better asymptotic bounds
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have