Abstract

We investigated the triples distance as a measure of the distance between two rooted bifurcating phylogenetic trees. The triples distance counts the number of subtrees of three taxa that are different in the two trees. Exact expressions are given for the mean and variance of the sampling distribution of this distance measure. Also, a normal approximation is proved under the class of label-invariant models on the distribution of trees. The theory is applied to the usage of the triples distance as a statistic for testing the null hypothesis that the similarities in two trees can be explained by independent random structures. In an example, two phylogenies that describe the same seven species of chloroccalean zoosporic green algae are compared: one phylogeny based on morphological characteristics and one based on ribosomal RNA gene sequence data. (Tree comparison metrics; random trees; label-invariant models; hypothesis test.) Developing interpretable measures of the distance between trees and of their sampling distributions under various probability models is important to the study of phylogenetic inference. Distance measures are a valuable tool for compar- ing phylogenetic trees created from two or more sources of data (e.g., Penny et al., 1982; Bledsoe and Raikow, 1990; Penny et al., 1991; Swofford, 1991; Estabrook, 1992), for reporting the results of a bootstrap analysis or of a comparison of phylogeny algorithms (e.g., Kuhner and Felsenstein, 1994), for making confidence statements about a proposed phylogeny, and for ex- amining subtrees of particular taxa. As an example of the first use, consider the two phylogenies presented in Figure 1 for sev- en species of chloroccalean zoosporic green algae. Figure la shows a rooted bi- furcating tree based on an assessment of certain morphological characteristics (pri- marily the details of the flagellar appara- tus of motile cells), and Figure lb shows the parsimony tree based on ribosomal RNA gene (rDNA) sequence data (Wilcox et al., 1992). Can we quantify the differ- ence between the trees? Can the similari- ties in the two trees be explained by ran- dom chance? In this paper, we propose using the number of subtrees of three taxa that are different in the two trees as a mea- sure of the distance between them. To an- swer the second question, we find the mean and variance of this statistic, along with its asymptotic distribution, under the model that the two trees have completely independent structures. Several metrics for comparing phyloge-

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call