Scaling similarity joins over tree-structured data

Yu Tang,Yilun Cai,Nikos Mamoulis

doi:10.14778/2809974.2809976

Abstract

Given a large collection of tree-structured objects (e.g., XML documents), the similarity join finds the pairs of objects that are similar to each other, based on a similarity threshold and a tree edit distance measure. The state-of-the-art similarity join methods compare simpler approximations of the objects (e.g., strings), in order to prune pairs that cannot be part of the similarity join result based on distance bounds derived by the approximations. In this paper, we propose a novel similarity join approach, which is based on the dynamic decomposition of the tree objects into subgraphs, according to the similarity threshold. Our technique avoids computing the exact distance between two tree objects, if the objects do not share at least one common subgraph. In order to scale up the join, the computed subgraphs are managed in a two-layer index. Our experimental results on real and synthetic data collections show that our approach outperforms the state-of-the-art methods by up to an order of magnitude.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the VLDB Endowment	Publication Date: Jul 1, 2015
Citations: 39	License type: cc-by-nc-nd

R Discovery Prime

R Discovery Prime

Scaling similarity joins over tree-structured data

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment

Lead the way for us

Similar Papers

Scaling similarity joins over tree-structured data
...
-
, et. al. ...
01 Jul 2015
01 Jul 2015

An Efficient and Optimized Service Discovery Methodology for QoS Aware Service Oriented Business Intelligence
...
International Review on Computers and Software | VOL. 8
, et. al. ...
30 Sep 2013
International Review on Computers and Software | VOL. 8

U2-Tree: A Universal Two-Layer Distributed Indexing Scheme for Cloud Storage System
Xiaofeng Gao ... Guihai Chen
IEEE/ACM Transactions on Networking | VOL. 27
Xiaofeng Gao, et. al.Xiaofeng Gao ... Guihai Chen
01 Feb 2019
IEEE/ACM Transactions on Networking | VOL. 27

A clique-based method using dynamic programming for computing edit distance between unordered trees.
Tomoya Mori ... Atsuhiro Takasu
Journal of Computational Biology | VOL. 19
Tomoya Mori, et. al.Tomoya Mori ... Atsuhiro Takasu
01 Oct 2012
Journal of Computational Biology | VOL. 19

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Scaling similarity joins over tree-structured data

Abstract

Talk to us

Similar Papers

More From: Proceedings of the VLDB Endowment