Abstract

-The degree of underestimation of branch lengths by the maximum parsimony principle is studied. The expected number of nucleotide changes per site under the maximum parsimony principle is computed, and it is compared with the expected number of nucleotide substitutions. A tree topology with no hierarchical structure is considered for mathematical simplicity. It is shown that as long as the evolutionary distance is less than 0.2, the maximum parsimony principle gives good estimates of nucleotide substitutions. When the evolutionary distance is greater than 0.2, however, the method gives gross underestimates of nucleotide substitutions. [Branching; parsimony; phylogenetics; topology.] There are two major problems in constructing a phylogenetic tree from molecular data. One is the determination of the topology of a tree and the other is the estimation of branch lengths. For the first problem, the maximum parsimony method (Camin and Sokal, 1965; Fitch, 1977) has been extensively used for amino acid or nucleotide sequence data. For the estimation of branch lengths, however, this method is expected to underestimate the number of amino acid or nucleotide substitutions. This property comes from the principle of the method itself: minimize the number of changes required. Thus each branch length (estimated by Fitch's [1971] method) is expected to be smaller than the real length on average. In spite of this known shortcoming, the maximum parsimony method seems to be quite appropriate for the estimation of branch length in terms of amino acid or nucleotide substitutions when closely related sequences are compared, since the probability of backward and parallel substitutions is negligible in this situation. But how close should sequences be? The number of sequences compared is also related to this problem, because we expect to extract more and more changes as the number of sequences is increased. So far, there seems to be no theoretical study on these subjects. In this paper, I show the effect of the amount of divergence and the number of nucleotide sequences on the estimates of branch lengths by the maximum parsimony principle. For simplicity, I consider the model of random nucleotide substitution (Jukes and Cantor, 1969). A constant rate of evolution, or the molecular clock, is also assumed. Further, a tree topology with no hierarchical structure is considered. Under these assumptions, the expected number of required nucleotide changes per site estimated by the maximum parsimony principle is computed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call