Abstract

The tree inclusion problem is, given two node-labeled trees P and T (the “pattern tree” and the “target tree”), to locate every minimal subtree in T (if any) that can be obtained by applying a sequence of node insertion operations to P. Although the ordered tree inclusion problem is solvable in polynomial time, the unordered tree inclusion problem is NP-hard. The currently fastest algorithm for the latter is a classic algorithm by Kilpeläinen and Mannila from 1995 that runs in O(d22dmn) time, where m and n are the sizes of the pattern and target trees, respectively, and d is the degree of the pattern tree. Here, we develop a new algorithm that runs in O(d2dmn2) time, improving the exponential factor from 22d to 2d by considering a particular type of ancestor-descendant relationships that is suitable for dynamic programming. We also study restricted variants of the unordered tree inclusion problem.

Highlights

  • Tree pattern matching and measuring the similarity of trees are classic problem areas in theoretical computer science

  • The first algorithm to achieve this bound ran in O(n6) time [20], where n is the total number of nodes in T1 and T2, and it was gradually improved upon until Demaine et al [12] presented an O(n3)-time algorithm thirty years later which was proved to be worst-case optimal under a conjecture that there is no truly subcubic time algorithm for the all pairs shortest paths problem [9]

  • We assume the following formulation of the problem: given a “text tree” T and a “pattern tree” P, locate every minimal subtree in T that can be obtained by applying a sequence of node insertion operations to P . (Equivalently, one may define the tree inclusion problem so that only node deletion operations on T are allowed.) For unordered trees, Kilpeläinen and Mannila [14] proved the problem to be NP-hard in general but solvable in polynomial time when the degree of the pattern tree is bounded from above by a constant

Read more

Summary

Introduction

Tree pattern matching and measuring the similarity of trees are classic problem areas in theoretical computer science. An important special case of the tree edit distance problem known as the tree inclusion problem is obtained when only node insertion operations are allowed. (Equivalently, one may define the tree inclusion problem so that only node deletion operations on T are allowed.) For unordered trees, Kilpeläinen and Mannila [14] proved the problem to be NP-hard in general but solvable in polynomial time when the degree (outdegree) of the pattern tree is bounded from above by a constant. Note that the special case of the tree inclusion problem where node insertion operations are only allowed to insert new leaves corresponds to a subtree isomorphism problem, which can be solved in polynomial time for unordered trees [17]

Practical applications
New results
Preliminaries
A B CDE F v0
C A B CDE
Concluding remarks
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call