MQH

Kejing Lu,Chuan Xiao,Yoshiharu Ishikawa

doi:10.14778/3574245.3574269

Abstract

Point-to-hyperplane nearest neighbor search (P2HNNS) is a fundamental problem which has many applications in data mining and machine learning. In this paper, we propose a provable Locality-Sensitive-Hashing (LSH) scheme based on multi-level quantization errors to solve this problem. In the indexing phase, for each data point, we compute the hash values of its residual vectors generated by a stepwise quantization process. In the query phase, for each processed point, we first determine its suitable level for hashing and then determine the size of hash bucket based on its quantization error in that level. We theoretically show that this treatment not only yields a probability guarantee on query results, but also makes the generated hash functions much more efficient to prune those false points. Experimental results on five real datasets show that the proposed approach generally runs 2X-10X faster than the state-of-the-art LSH-based approaches.

Full Text