Abstract

AbstractLocality‐sensitive hashing (LSH) is widely used in the context of nearest neighbor search of large‐scale high‐dimensions. However, there are serious imbalance problems between the efficiency of data index structure construction and the query accuracy of LSH methods. In this article, a novel higher‐entropy‐hyperplane clusters LSH (HEHC‐LSH) algorithm is proposed, which we improve vector quantization to preprocess the data and greatly shortens the preprocessing time; We innovatively integrate the maximum entropy principle into the distribution estimation algorithm to construct a novel hash function cluster method, also incorporate bootstrap aggregating of ensemble learning, and adopt the parallel index dictionary to improve the generalization performance of the index structure. And in the query stage, we realize the comprehensive filtering of index set using integrated learning idea, which not only avoids a lot of distance calculation, but also improves the quality of query results. We also analyze the rationality and effectiveness of the proposed method. Finally, extensive experiment results show that HEHC‐LSH can achieve more higher precision and efficiency simultaneously comparing to current methods, and reflect the strong robustness on different datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call