R2PCAH: Hashing with two-fold randomness on principal projections

Peng Li,Peng Ren

doi:10.1016/j.neucom.2017.01.019

Abstract

Hashing based strategies have recently been widely used in fast similarity search on large scale datasets. Data-independent methods such as Locality Sensitive Hashing (LSH) usually adopt random projections as hash functions, with theoretical guarantees that the performance improves with the increasing code length. Thus they require relatively long codes, making them less effective than data-dependent methods. On the other hand, in many data-dependent hashing methods, Principal Component Analysis (PCA) is widely used to generate compact hash codes. However, PCA based methods tend not to be effective for generating long codes because projections with small variances may induce certain redundancy and noise. In order to address these deficiencies, we present a R2PCAH framework that conducts two-fold random transformations based on principal projections for hash code learning. Specifically, only the top PCA projections of the training data are extracted and two-fold random transformations, i.e. random rotations and random shifts are performed on the projected data to generate several pieces of component short codes. The multiple component short codes are then concatenated into one piece of long code. We observe that our method shares the advantages of both LSH and PCA based hashing methods. Extensive experiments demonstrate the effectiveness of the proposed method.

Full Text