Abstract
-A randomization procedure is proposed to determine if sets of data used for phylogenetic analysis contain phylogenetically nonrandom information. The method compares the observed number of steps on a minimum length tree with the mean number of steps on minimum length trees derived from the same data set after character state assignments have been randomly permuted within each character. Such randomized data sets will exhibit exactly the same character state distributions as the original data but no phylogenetic information. In tests of 28 separate data sets using this procedure, the minimum lengths of each data set differed significantly from that expected for phylogenetically non-informative data in spite of the fact that observed consistency indices from the original data were as low as 0.230. The high correlation between number of steps per character on minimum length trees and number of taxa among the 28 original data sets is consistent with that expected if a more or less constant frequency of homoplasy occurs per character per taxon. This correlation implies that the consistency index may be an inappropriate, comparative measure of homoplasy among data sets. The observed pattern of increasing homoplasy with increasing numbers of taxa for the original data sets is curvilinear (when forced to pass through a fixed point for all data sets). This is qualitatively different from that expected for random data. Possible uses of the randomization techniques are suggested in cladistic studies using either compatibility analysis or parsimony. [Phylogeny; randomization test; parsimony; minimum length trees; consistency index; homoplasy.] The number of steps implied to have occurred on minimum length trees is an important statistic in the analysis of systematic data for phylogenetic inference. For a given number of characters and taxa, as the number of steps necessary to explain the distribution of character states among a group of taxa increases, that is, as total homoplasy increases, the investigator's confidence in both the resulting cladogram and the data used in the analysis decreases. Although a variety of measures of homoplasy on minimum length trees have been proposed, the consistency index of Kluge and Farris (1969), which is equivalent to the reciprocal of the number of steps per character for binary characters, is the most frequently used measure of homoplasy. Phylogenetic hypotheses obtained from data sets that exhibit high consistency levels as measured by the consistency index (values near 1.0) are generally given more credence than are the results from data sets 1 Current address: Department of Biology, California State University, Long Beach, CA 90840. with low values of the consistency index (values near the limiting value of 0.0). In our studies of methods for coding quantitative characters for phylogenetic data analysis, it became apparent that many of these data sets showed very low consistency (high homoplasy). Often consistency index values as low as 0.3 were observed. As a result, we began to question whether there was any useful phylogenetic information in particular data sets as evidenced by the number of steps on the minimum length tree or by the magnitude of the consistency index. Although Rohlf and Fisher (1968) examined the distribution of the cophenetic correlation coefficient for random phenotypic data in order to test for hierarchical structure using phenetic cluster analysis, no comparable procedure has yet been proposed (except that of Archie and Felsenstein, 1989) for phylogenetic hypotheses derived using parsimony. The purposes of the study presented here are 1) to develop a randomization procedure that can be used to determine the distribution of the number of steps on minimum length trees for phylogenetically random
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have