Abstract

Archie (1989) and Faith and Cranston (1991) independently developed a parsimony-based randomization test for assessing the quality of a phylogenetic data matrix. Matrix randomization tests have had a mixed reception from phylogeneticists (e.g., Kallersjo et al., 1992; Alroy, 1994; Carpenter et al., 1998; Wilkinson, 1998; Siddall, 2001). In general, however, these are well-founded statistical techniques (Manly, 1991) thatmaybewell-suited tophylogenetic contexts where models or assumptions underlying parametric statistical methods are either difŽcult to justify or to test. In a matrix randomization test, a test statistic (typically a measure of data “quality”) is calculated for the original data, and the result is contrasted against a null distribution of the test statistic determined by repeated randomization of the data. Randomization is by random permutation of the assignment of character states to taxawithin each character. Essentially, each character in thedataset is independently shufed so that congruence between characters is reduced to the extent that would be expected by chance alone. The random permutation preserves some features of the data that are known to affect measures of data quality, such as the total number of characters and taxa and the numbers of taxa with each character state within each character (Archie, 1989; Sanderson and Donoghue, 1989; Faith and Cranston, 1991). Thus the null distribution represents a distribution that one would expect from comparable phylogenetically uninformative data. The simplest parsimony-based matrix randomization tests use the length of the most-parsimonious trees (MPTs) as the test statistic, comparing this for real and randomly permuted data. A corresponding simple test statistic for the null hypothesis that the data are indistinguishable from random is the parsimony permutation tail probability or parsimony PTP (Faith and Cranston, 1991). The parsimony PTP is the proportion of data sets (real and randomly permuted) that yield MPTs as short or shorter than the MPTs for the original data. Slowinski and Crother (1998) used 40 real data sets in an empirical evaluation of the utility of the parsimony PTP. SpeciŽcally, they compared PTPs with the fraction of clades supported by bootstrap proportions exceeding 50%. In addition, they compared PTPs with the resolution of strict component consensus trees. They reported that data sets that appear to be poorly structured, based on bootstrap analyses or because they have a poorly resolved strict component consensus, tend to have signiŽcant PTPs, and they concluded that (p. 300) “the PTP test is too liberal” and is of limited utility. Peres-Neto and Marques (2000) expressed concern at the use of one statistical test (the bootstrap) to evaluate another (parsimony PTP) and presented simulation studies that attempted to address the performance of the PTP test more directly. Their simulation studies involved performing PTP tests on randomly generated data. Because data are generated randomly, the null hypothesis is true and the number of times that the null hypothesis is rejected

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call