Abstract

Locality sensitive hashing (LSH) is a widely practiced c-approximate nearest neighbor (c-ANN) search algorithm because of its appealing theoretical guarantee and empirical performance. However, available LSH-based solutions do not achieve a good balance between cost and quality because of certain limitations in their index structures. In this paper, we propose a novel and easy-to-implement disk- based method named R2LSH to answer ANN queries in high-dimensional spaces. In the indexing phase, R2LSH maps data objects into multiple two-dimensional projected spaces. In each space, a group of B+-trees is constructed to characterize the corresponding data distribution. In the query phase, by setting a query-centric ball in each projected space and using a dynamic counting technique, R2LSH efficiently determines candidates and returns query results with the required quality. Rigorous theoretical analysis reveals that the proposed algorithm supports c-ANN search for arbitrarily small c ≥ 1 with probability guarantee. Extensive experiments on real datasets verify the superiority of R2LSH over state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call