Abstract

We study two problems in computational phylogenetics. The first is tree compatibility. The input is a collection of phylogenetic trees over different partially-overlapping sets of species. The goal is to find a single phylogenetic tree that displays all the evolutionary relationships implied by . The second problem is incomplete directed perfect phylogeny (IDPP). The input is a data matrix describing a collection of species by a set of characters, where some of the information is missing. The question is whether there exists a way to fill in the missing information so that the resulting matrix can be explained by a phylogenetic tree satisfying certain conditions. We explain the connection between tree compatibility and IDPP and show that a recent tree compatibility algorithm is effectively a generalization of an earlier IDPP algorithm. Both algorithms rely heavily on maintaining the connected components of a graph under a sequence of edge and vertex deletions, for which they use the dynamic connectivity data structure of Holm et al., known as HDT. We present a computational study of algorithms for tree compatibility and IDPP. We show experimentally that substituting HDT by a much simpler data structure—essentially, a single-level version of HDT—improves the performance of both of these algorithm in practice. We give partial empirical and theoretical justifications for this observation.

Highlights

  • A phylogenetic tree is a graphical depiction of the evolutionary history of a collection of taxa

  • The problem is to find a tree T whose taxon set is the union of the taxon sets of the input trees, such that each input tree Ti can be obtained from the restriction of T

  • BuildNT is closely related to Semple and Steel’s version of B UILD [2]

Read more

Summary

Introduction

A phylogenetic tree is a graphical depiction of the evolutionary history of a collection of taxa (typically species or genes). The problem is to find a tree T whose taxon set is the union of the taxon sets of the input trees, such that each input tree Ti can be obtained from the restriction of T to the leaf set of Ti through edge contraction. If such a tree T exists, P is said to be compatible; otherwise, P is incompatible. Since a profile of rooted trees is effectively a collection of unrooted trees that have a common root taxon, the preceding observation establishes the connection between rooted tree compatibility and IDPP. Our empirical results show that, in this setting, simple data structures perform better than more sophisticated ones with better asymptotic bounds

Background
Contributions
Contents
Graphs and Phylogenetic Trees
Spanning Forests and Euler Tour Trees
Edge Deletion in HDT
Level Truncation
Tree Compatibility
The Display Graph
Incomplete Directed Perfect Phylogeny
The Relationship between Tree Compatibility and IDPP
Experiments with Tree Compatibility
Real Datasets
Generating Simulated Data
Impact of Level Truncation
Worst-Case Time versus Empirically-Observed Time
Performance on Profiles of More General Phylogenetic Trees
Connectivity Testing versus Maintaining Semi-Universal Labels
Experiments with IDPP
Simulated Datasets
Solving IDPP via Tree Compatibility
Analysis
The Impact of Deleting Non-Tree Edges
The Number of Edges Scanned
The Size of the Smaller Component
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call