Locality Sensitive Hashing with Extended Partitioning Boundaries

Keon Myung Lee

doi:10.4028/www.scientific.net/amm.321-324.804

Abstract

Locality-sensitive hashing is a technique to allow approximate nearest search for large volume of data in a fast manner. Binary code locality-sensitive hashing distributes a data set into buckets labeled with binary code, where binary codes are determined by a set of hash functions. The binary hash codes play the role of partitioning the data space into subspaces. When close neighbors are placed around subspace boundaries, there are chances to fail in locating them. It requires to check neighboring buckets while finding nearest ones. The paper presents a technique to enhance the search performance by introducing the notion of extended boundary. It reduces the potential misses and the search overhead especially for the regions located at the double-napped corners. Keywords: locality sensitive hashing, data search, hashing, data analysis

Full Text