Abstract
BackgroundMeasuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees. The edit distance is one of the most widely used measures for comparison of tree structured data. However, it is known that computation of the edit distance for rooted unordered trees is NP-hard. Furthermore, there is almost no available software tool that can compute the exact edit distance for unordered trees.ResultsIn this paper, we present a practical method for computing the edit distance between rooted unordered trees. In this method, the edit distance problem for unordered trees is transformed into the maximum clique problem and then efficient solvers for the maximum clique problem are applied. We applied the proposed method to similar structure search for glycan structures. The result suggests that our proposed method can efficiently compute the edit distance for moderate size unordered trees. It also suggests that the proposed method has the accuracy comparative to those by the edit distance for ordered trees and by an existing method for glycan search.ConclusionsThe proposed method is simple but useful for computation of the edit distance between unordered trees. The object code is available upon request.
Highlights
Measuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees
We used glycan structures obtained from KEGG/ Glycan database [26]
Results on efficiency First we examined the computational efficiency of maximum vertex weighted clique (MWCQ)-based method, where we used the standard weighting scheme (i.e., f(v, v) = 2 and f(u, v) = 1 for l(u) ≠ l(v)) corresponding to the unit cost edit distance
Summary
We implemented the above mentioned maximum clique-based method (MCQ-based method) and maximum vertex weighted clique-based method (MWCQ-based method) using C language. Results on similar structure search Though the ordered and unordered tree edit distances are widely-accepted (dis)similarity measures on trees, we performed computational experiments in order to examine how it is useful for similarity search for glycans. Glycan alignment scores were introduced for efficient comparison of glycan structures Though it is based on tree edit distance, the deletion (and corresponding insertion) operation is simplified so that only one child and its descendants can survive if a node is deleted. It is seen from these figures that the tree edit distance for unordered trees provides the most flexible matching
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.