Abstract
In the practice of molecular evolution, different phylogenetic trees for the same group of species are often produced either by procedures that use diverse optimality criteria [24] or from different genes [15, 16, 17, 18, 14]. Comparing these trees to find their similarities (e.g. agreement or consensus) and dissimilarities, i.e. distance, is thus an important issue in computational molecular biology. The nearest neighbor interchange (nni) distance [29, 28, 34, 3, 6, 2, 19, 20, 23, 33, 22, 21, 26] is a natural distance metric that has been extensively studied. Despite its many appealing aspects such as simplicity and sensitivity to tree topologies, computing this distance has remained very challenging, and many algorithmic and complexity issues about computing this distance have remained unresolved. This paper studies the complexity and efficient approximation algorithms for computing the nni distance and a natural extension of this distance on weighted phylogenies. The following results answer many open questions about the nni distance posed in the literature. 1. Computing the nni distance between two labeled trees is NP-complete. This solves a 25 year old open question appearing again and again in, for example, [29, 34, 3, 6, 2, 19, 20, 23, 22, 21, 26]. 2. Computing the nni distance between two unlabeled trees is also NPcomplete. This answers an open question in [3] for which an erroneous proof appeared in [23]. 3. Biological applications motivate us to extend the nni distance to weighted phylogenies, where edge weights indicate the time-span of evolution along each edge. We present an O(n2) time approximation algorithm for computing the nni distance on weighted phylogenies with a performance ratio of 4 logn+ 4, where n is the number of leaves in the phylogenies. We also observe that the nni distance is in fact identical to the linear-cost subtree-transfer distance on unweighted phylogenies discussed in [4, 5]. Some consequences of this observation are also discussed. 1991 Mathematics Subject Classification. Primary 68Q17, 68W40; Secondary 68Q25. The results reported here also form a subset of the results that appeared in Proc. 8th Annual ACM-SIAM Symposium on Discrete Algorithms, 1997, pp. 427-436 [4]. The remaining results of the conference paper which do not appear in this paper appeared separately in Algorithmica, Vol. 25, No. 2, pp. 176-195, 1999. The first author was supported by an CGAT (Canadian Genome Analysis and Technology) grant. The second author was supported in part by CGAT and NSF grant 9205982. The third author was supported in part by NSERC Operating Grant OGP0046613 and CGAT. The fourth author was supported by NSERC Operating Grant OGP0046506 and CGAT. The fifth author was supported by an NSERC International Fellowship and CGAT. . Work done while the first author was at University of Waterloo and McMaster University, the second author was visiting at University of Waterloo, the third author was visiting University of Washington, and the fifth and the sixth authors were at University of Waterloo.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have