Abstract

The approximate nearest neighbor(ANN) search over high dimensional multimedia data has become an unavoidable service for online applications. Returning fast and high-quality results of unknown queries are the largest challenge that most algorithms faced with. Locality Sensitive Hashing(LSH) is a well-known ANN search algorithm while suffers from inefficient index structure and poor accuracy in the distributed scheme. The traditional index structures have most significant bits(MSB) problem, which is their indexing strategies have an implicit assumption that the bits from one direction in the hash value have higher priority. In this paper, we propose new content-based index called Random Draw Forest(RDF), which not only applies a content-based partition strategy to reduce the search range for fast query response, but also uses the shuffling permutations on hash values to solve the most significant bits problem. We also study the trade-off between query's efficiency and accuracy after applying our partition strategy. In the experiment, we show the effect of parameters and the salient performance of RDF compared with other LSH-based methods to meet the online ANN search.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call