Locality-sensitive hashing (LSH) has gained ever-increasing popularity in similarity search for large-scale data. It has competitive search performance when the number of generated hash bits is large, reversely bringing adverse dilemmas for its wide applications. The first purpose of this work is to introduce a novel hash bit reduction schema for hashing techniques to derive shorter binary codes, which has not yet received sufficient concerns. To briefly show how the reduction schema works, the second purpose is to present an effective bit reduction method for LSH under the reduction schema. Specifically, after the hash bits are generated by LSH, they will be put into bit pool as candidates. Then mutual information and data labels are exploited to measure the correlation and structural properties between the hash bits, respectively. Eventually, highly correlated and redundant hash bits can be distinguished and then removed accordingly, without deteriorating the performance greatly. The advantages of our reduction method include that it can not only reduce the number of hash bits effectively but also boost retrieval performance of LSH, making it more appealing and practical in real-world applications. Comprehensive experiments were conducted on three public real-world datasets. The experimental results with representative bit selection methods and the state-of-the-art hashing algorithms demonstrate that the proposed method has encouraging and competitive performance.
Read full abstract