Boos ting multiple hash tables to search

Jin-Cheng Li Jin-Cheng Li

doi:10.1109/icmlc.2012.6358886

Abstract

Hashing based approximate nearest neighbor (ANN) search is an important technique to reduce the search time and storage for large scale information retrieval problems. Semi-supervised hashing (SSH) methods capture semantic similarity of data and avoid overfitting to training data. SSH outperforms supervised, unsupervised and Random Projection based hashing methods in semantic retrieval task. But, current semi-supervised and supervised hashing methods search by Hash lookup in a single hash table usually subject to a low recall. In order to achieve a high recall, an exhaustive search by hamming ranking is needed. It dramatically decreases the retrieval precision and increases the search time. In this paper, we propose to learn multiple semi-supervised hashing tables using boosting technique to overcome this problem. Multiple hash tables are learned sequentially with boosting to maximize hashing accuracy of each hash table. The mis-hash samples in the current hash table will be penalized by large weights and then the algorithm uses the new weight values to learn the next hash table. Given a query, the true semantic similar samples missed from the active buckets (the buckets in the small hamming radius of the query) of one hash table are more likely to be found in the active buckets of the next hash table. Experimental results show that our method achieves a high recall while preserves a high precision that outperforms several state-of-the-art hashing methods.

Full Text