Abstract

A Monte Carlo approach was used to estimate the accuracy of a given tree reconstruc- tion method for any number of taxa. In this procedure, we sampled randomly over all possible bifurcating trees assigning substitution rates (branch lengths) to each edge from an exponential distribution to obtain a biologically sensible maximal observed distance. Three different sets of trees were studied: the unrestricted tree space, the biologically meaningful tree space as intro- duced by Nei et al. (1995, Science 267:253-254), and the population data tree space. We used this technique to elucidate the performance of neighbor joining as a function of the number of taxa, assuming that distances are uncorrected and sequences evolve according to the Jukes-Cantor model. The accuracy of neighbor joining decreases almost exponentially with the number of taxa. However, the rate of decrease depends on the tree space studied. Although the accuracy decreases towards zero, the similarity, i.e., the number of partitions that are identical between model tree and reconstructed tree, is in all cases studied much higher than the value expected for two ran- domly chosen trees. Although the probability of recovering the true tree is dramatically influenced by sequence length, the average similarity does not decrease substantially if branch lengths are not too short. (Assigning edge lengths; Felsenstein zone; finite sequence length; Jukes-Cantor model; Monte Carlo sampling; neighbor joining.) A great deal of work has been put into studies on the accuracy of tree reconstruc- tion methods based on DNA or amino acid sequences, where accuracy is understood as the ability of a reconstruction method to recover the true branching pattern (to- pology) of the underlying tree from a giv- en data set. If one wants to understand the accuracy of various tree building methods, in principle, one has to consider the whole space of trees, i.e., all possible topologies with all possible assignments of edge lengths. More formally, let T n be the set of all nondegenerate binary trees with n leaves (taxa, species, sequences). The size of Tn rapidly increases with n, and the number of trees \Tn = 1-3-5- .. . -(2n — 5). Each tree T eTn has exactly 2n - 3 branch- es, and each branch is assigned a number of substitutions between zero and infinity. We denote with T n the space of all differ- ent tree topologies plus assignments of edge lengths. If one wants to understand the behavior of a tree reconstruction meth- od for a given evolutionary model, one therefore ought to study the entire space T n or an appropriately defined region thereof.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.