Efficient Similarity Search for Tree-Structured Data

Guoliang Li,Xuhui Liu,Lizhu Zhou,Jianhua Feng

doi:10.1007/978-3-540-69497-7_11

Abstract

Tree-structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. Although similarity search on textual data has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the similarity between trees, especially for large numbers of tress. In this paper, we propose to transform tree-structured data into strings with a one-to-one mapping. We prove that the edit distance of the corresponding strings forms a bound for the similarity measures between trees, including tree edit distance, largest common subtrees and smallest common super-trees. Based on the theoretical analysis, we can employ any existing algorithm of approximate string search for effective similarity search on trees. Moreover, we embed the bound into a filter-and-refine framework for facilitating similarity search on tree-structured data. The experimental results show that our algorithm achieves high performance and outperforms state-of-the-art methods significantly. Our method is especially suitable for accelerating similarity query processing on large numbers of trees in massive datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Similarity Search for Tree-Structured Data

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Similarity evaluation on tree-structured data
Rui Yang ... Anthony K H Tung
-
Rui Yang, et. al.Rui Yang ... Anthony K H Tung
14 Jun 2005
14 Jun 2005

Convolutional Embedding for Edit Distance
Xinyan Dai ... Yuxuan Wang
-
Xinyan Dai, et. al.Xinyan Dai ... Yuxuan Wang
25 Jul 2020
25 Jul 2020

A clique-based method using dynamic programming for computing edit distance between unordered trees.
Tomoya Mori ... Atsuhiro Takasu
Journal of Computational Biology | VOL. 19
Tomoya Mori, et. al.Tomoya Mori ... Atsuhiro Takasu
01 Oct 2012
Journal of Computational Biology | VOL. 19

Toward Efficient Similarity Search under Edit Distance on Hybrid Architectures
Madiha Khalid ... Muhammad Murtaza Yousaf
Information | VOL. 13
Madiha Khalid, et. al.Madiha Khalid ... Muhammad Murtaza Yousaf
26 Sep 2022
Information | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Similarity Search for Tree-Structured Data

Abstract

Talk to us

Similar Papers