Abstract
The tree inclusion problem is, given two node-labeled trees P and T (the “pattern tree” and the “target tree”), to locate every minimal subtree in T (if any) that can be obtained by applying a sequence of node insertion operations to P. Although the ordered tree inclusion problem is solvable in polynomial time, the unordered tree inclusion problem is NP-hard. The currently fastest algorithm for the latter is a classic algorithm by Kilpeläinen and Mannila from 1995 that runs in O(d22dmn) time, where m and n are the sizes of the pattern and target trees, respectively, and d is the degree of the pattern tree. Here, we develop a new algorithm that runs in O(d2dmn2) time, improving the exponential factor from 22d to 2d by considering a particular type of ancestor-descendant relationships that is suitable for dynamic programming. We also study restricted variants of the unordered tree inclusion problem.
Highlights
Tree pattern matching and measuring the similarity of trees are classic problem areas in theoretical computer science
The first algorithm to achieve this bound ran in O(n6) time [20], where n is the total number of nodes in T1 and T2, and it was gradually improved upon until Demaine et al [12] presented an O(n3)-time algorithm thirty years later which was proved to be worst-case optimal under a conjecture that there is no truly subcubic time algorithm for the all pairs shortest paths problem [9]
We assume the following formulation of the problem: given a “text tree” T and a “pattern tree” P, locate every minimal subtree in T that can be obtained by applying a sequence of node insertion operations to P . (Equivalently, one may define the tree inclusion problem so that only node deletion operations on T are allowed.) For unordered trees, Kilpeläinen and Mannila [14] proved the problem to be NP-hard in general but solvable in polynomial time when the degree of the pattern tree is bounded from above by a constant
Summary
Tree pattern matching and measuring the similarity of trees are classic problem areas in theoretical computer science. An important special case of the tree edit distance problem known as the tree inclusion problem is obtained when only node insertion operations are allowed. (Equivalently, one may define the tree inclusion problem so that only node deletion operations on T are allowed.) For unordered trees, Kilpeläinen and Mannila [14] proved the problem to be NP-hard in general but solvable in polynomial time when the degree (outdegree) of the pattern tree is bounded from above by a constant. Note that the special case of the tree inclusion problem where node insertion operations are only allowed to insert new leaves corresponds to a subtree isomorphism problem, which can be solved in polynomial time for unordered trees [17]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.