Abstract
Most workers attempting to infer evolutionary trees from polymorphic data using established parsimony procedures either have ignored frequency information entirely or have tolerated solutions that hypothesize ancestors that cannot possibly exist in the space of the original frequencies. We describe a new for inferring evolutionary trees from gene frequency data that avoids these limitations by enforcing biologically reasonable constraints without discarding the frequency information. The necessity of working directly with frequen- cies rather than reducing the data by presence/absence coding is argued. Our approach appears to be free of logical difficulties affecting a variety of other parsimony methods that have been applied to frequency data. (Parsimony; evolutionary trees; gene frequencies; electrophoresis; linear programming; phylogenetic analysis.) Since their development in the early 1960s, numerical methods based on the principle of parsimony have enjoyed con- siderable popularity as tools in phyloge- netic inference. Although parsimony al- gorithms such as the Wagner (Kluge and Farris, 1969; Farris, 1970) can, in theory, be applied to continuous char- acters, the majority of applications have involved the analysis of characters having discrete states, either naturally or as the result of recoding. In this context, there is a logical correspondence between mini- mizing the number of evolutionary tran- sitions (steps) and minimizing the num- ber of ad hoc hypotheses of homoplasy (parallelism, convergence, and reversal) needed to explain the data (Farris, 1983). But when characters are measured on a continuous scale, as are gene frequencies, the association between the amount of evo- lutionary change and the number of extra assumptions of homoplasy required be- comes less obvious. One solution to the problem is to alter the data, and many workers have, explicitly or implicitly, fa- vored discarding frequency information in order to obtain data that are representable in discrete form (Throckmorton, 1978; Far- ris, 1981; Mickevich and Mitter, 1981; Stra- ney, 1981; Patton and Avise, 1983). As we will argue here, however, this approach is subject to a serious problem of sampling error and fails to cope adequately with the wealth of polymorphism revealed by elec- trophoretic and serological techniques. Gene frequency data have been subject- ed directly to parsimony analysis, albeit at a sacrifice in simplicity of interpretation. Cavalli-Sforza and Edwards's (1967) method of minimum searches for trees that minimize the total amount of gene frequency change in multidimen- sional Euclidean space. For a given tree topology, optimal coordinates (arrays of al- lele frequencies) for hypothetical ancestral populations may be uniquely determined using an iterative algorithm (Thompson, 1973). Methods for selecting from the set of possible topologies those requiring the minimum amount of evolution are dis- cussed by Cavalli-Sforza and Edwards (1967), Kidd and Sgaramella-Zonta (1971), and Thompson (1973). Rogers (1984) has recently described a similar approach, ap- plying it to his earlier (1972) distance mea- sure, widely used in electrophoretic stud- ies. Rogers's (1984) has the advantage that it can be generalized to non- Euclidean distances such as the arc distance of Cavalli-Sforza and Edwards (1967) and even to Nei's (1972, 1978) nonmetric dis- tances (Rogers, 1986).
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have