Efficient Metric Indexing for Similarity Search and Similarity Joins

Lu Chen,Yunjun Gao,Gang Chen,Xinhan Li,Christian S Jensen

doi:10.1109/tkde.2015.2506556

Abstract

Spatial queries including similarity search and similarity joins are useful in many areas, such as multimedia retrieval, data integration, and so on. However, they are not supported well by commercial DBMSs. This may be due to the complex data types involved and the needs for flexible similarity criteria seen in real applications. In this paper, we propose a versatile and efficient disk-based index for metric data, the S pace-filling curve and Pivot-based B $^{+}$ -tree (SPB-tree). This index leverages the B $^+$ -tree, and uses space-filling curve to cluster data into compact regions, thus achieving storage efficiency. It utilizes a small set of so-called pivots to reduce significantly the number of distance computations when using the index. Further, it makes use of a separate random access file to support a broad range of data. By design, it is easy to integrate the SPB-tree into an existing DBMS. We present efficient algorithms for processing similarity search and similarity joins, as well as corresponding cost models based on SPB-trees. Extensive experiments using both real and synthetic data show that, compared with state-of-the-art competitors, the SPB-tree has much lower construction cost, smaller storage size, and supports more efficient similarity search and similarity joins with high accuracy cost models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Metric Indexing for Similarity Search and Similarity Joins

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering

Lead the way for us

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: Mar 1, 2017
Citations: 37

Similar Papers

Efficient metric indexing for similarity search
Lu Chen ... Xinhan Li
-
Lu Chen, et. al.Lu Chen ... Xinhan Li
01 Apr 2015
01 Apr 2015

On Fuzzy vs. Metric Similarity Search in Complex Databases
Alan Eckhardt ... Peter Vojtáš
-
Alan Eckhardt, et. al.Alan Eckhardt ... Peter Vojtáš
01 Jan 2009
01 Jan 2009

Efficient Similarity Search in Nonmetric Spaces with Local Constant Embedding
Lei Chen ... Xiang Lian
IEEE Transactions on Knowledge and Data Engineering | VOL. 20
Lei Chen, et. al.Lei Chen ... Xiang Lian
01 Jan 2008
IEEE Transactions on Knowledge and Data Engineering | VOL. 20

B-Bit Sketch Trie: Scalable Similarity Search on Integer Sketches
Shunsuke Kanda ... Yasuo Tabei
-
Shunsuke Kanda, et. al.Shunsuke Kanda ... Yasuo Tabei
01 Dec 2019
01 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Metric Indexing for Similarity Search and Similarity Joins

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering