Abstract

Wilkinson et al. (2003) took issue with Farris et al. (2001) over the concept of whether the branch lengths of phylogenetic trees can be used to indicate support for the component clades. I do not wish to enter into this particular debate here, but instead I point out that some of the details discussed by Wilkinson et al. are apparently based on a misinterpretation of the information provided by Farris et al. In particular, two pieces of information presented by Farris et al. are apparently ambiguous and have led Wilkinson et al. to some incorrect argumentation. This does not necessarily change the substantive conclusions reached by Wilkinson et al., but it does change the details of their argument. Figure 1 of Farris et al. (2001) shows part of an artificial DNA sequence data matrix (for taxa labeled A–H), along with two optimal phylogenetic trees derived from analysis of the complete data matrix (which contains 60 repetitions of the small matrix shown). The data analysis was performed using the default settings of the DNAML program of Felsenstein (1993), and thus the trees have maximum likelihood under the chosen evolutionary model. Farris et al. (2001:298) claimed that the consensus tree for the data is “unresolved,” whereas Wilkinson et al. (2003) pointed out that the strict component consensus of the two trees shown in figure 1 of Farris et al. does in fact have one informative component. Interestingly, both parties are making statements that are uncontradicted by the information provided by Farris et al. This arises because Wilkinson et al. based their discussion solely on the trees presented by Farris et al., and these trees do not accurately reflect the data matrix. First, the trees as shown by Farris et al. (2001) may be superficially interpreted as being fully resolved binary trees. However, this interpretation is incorrect because not all of the internodes shown on the trees represent evolutionary branches. In the text, Farris et al. (2001:298) explicitly stated, “With the exception of the ABEF/CDGH split, however, all the interior branches of both trees have the same length, about 0.16.” Note that four of the internodes shown on the trees are referred to as “branches” with a specified branch length, whereas the fifth internode is referred to as a “split” with no specified branch length. Inspection of the data matrix reveals that this fifth internode represents a split (or bipartition) that is neither supported nor contradicted by the sequence data. In other words, if the internode is interpreted as a branch, then it has zero branch length. It is, perhaps, worth noting that the DNAML program actually reports this internode as having a length of 0.00006, but the likelihood-ratio test makes it clear that this estimated length is not significantly different from zero. Therefore, it would be more accurate to represent this part (of both trees) as a four-way polytomy; the chosen presentation is ambiguous because the trees as shown involve an apparently arbitrary resolution of a polytomy. While phylogenetic trees showing branches with zero length are not unknown in the literature (e.g., fig. 5 of Kluge and Farris, 1969, has two such branches), these unsupported resolutions of multifurcations have been severely criticized in the context of searching for and presenting phylogenetic trees (e.g., Nixon and Carpenter, 1996; Farris and Kallersjo, 1998) — quite literally in this case, apparent branch lengths (on the diagram) do not indicate real support (in the data). Second, from the context it is possible to interpret the two trees presented by Farris et al. (2001) as being the only two optimal trees that can be derived from the data matrix because Farris et al. (2001:298) stated that “this matrix has two different maximum-likelihood trees, as shown.” In fact, the matrix has four different trees with the same likelihood under this particular evolutionary model, as revealed by analysis of the data using the PAUP* program (Swofford, 2002). Use of this program obviates the methodological difficulties referred to by Farris et al. (2001:298) because the data matrix is small enough to be analyzed using the exhaustive search option (there is no branch-and-bound procedure known for maximum likelihood, and PAUP* defaults to performing an exhaustive search if this option is chosen). That all four of these trees are optimal was independently confirmed by inputting them to DNAML as user trees under the default settings (the strategy used by Farris et al.). The four maximum-likelihood trees are shown here in Figure 1. There are only eight splits (or bipartitions)

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call