Abstract
The maximum parsimony (MP) method for inferring phylogenies is widely used, but little is known about its limitations in non-asymptotic situations. This study employs large-scale computations with simulated phylogenetic data to estimate the probability that MP succeeds in finding the true phylogeny for up to twelve taxa and 256 characters. The set of candidate phylogenies are taken to be unrooted binary trees; for each simulated data set, the tree lengths of all (2n − 5)!! candidates are computed to evaluate quantities related to the performance of MP, such as the probability of finding the true phylogeny, the probability that the tree with the shortest length is unique, the probability that the true phylogeny has the shortest tree length, and the expected inverse of the number of trees sharing the shortest length. The tree length distributions are also used to evaluate and extend the skewness test of Hillis for distinguishing between random and phylogenetic data. The results indicate, for example, that the critical point after which MP achieves a success probability of at least 0.9 is roughly around 128 characters. The skewness test is found to perform well on simulated data and the study extends its scope to up to twelve taxa.
Highlights
The maximum parsimony (MP) method for selecting a phylogenetic tree was developed in the early 1970’s (e.g. [1])
Its fundamental idea is to find the phylogeny that minimizes the amount of evolutionary change required. It is distinctly different from probabilistic methods, it can be shown to be equivalent to the maximum likelihood approach for the exceedingly complex and unrealistic “no common mechanism” model [2]
Our results enabled us to measure the performance of the skewness test [14] that attempts to detect the presence of “phylogenetic signal” in a given data set by computing the third standardized moment of the distribution of the tree-length distribution
Summary
The maximum parsimony (MP) method for selecting a phylogenetic tree was developed in the early 1970’s (e.g. [1]). Its fundamental idea is to find the phylogeny that minimizes the amount of evolutionary change required (and maximizes parsimony). It is distinctly different from probabilistic methods, it can be shown to be equivalent to the maximum likelihood approach for the exceedingly complex and unrealistic “no common mechanism” model [2]. Maximum parsimony has been found to outperform likelihood methods when evolution is heterogeneous [4]. It is well-known that MP fails to be consistent under certain conditions [5, 6]. This is not a prohibitive restriction: sufficient conditions under which MP is consistent have been derived (cf. [7] and the references therein), and one can find applied studies where
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have