Deep hashing has been widely used in multimedia retrieval systems due to its storage and computation efficiency. Unsupervised hashing has received a lot of attention in recent years because it does not rely on label information. However, existing deep unsupervised hashing methods usually use rough pairwise relations to constrain the similarity between hash codes locally, which is insufficient and inefficient to reconstruct accurate correlations across samples. To address this issue, we propose a generic distillation framework for the preservation of the similarity relationship. Specifically, we design a distillation loss to reconstruct the batchwise similarity distribution between feature space and hash code space, allowing us to capture the global correlation knowledge contained in features and propagate it into hash codes efficiently. This framework can apply to both intra-modal and inter-modal scenarios. Furthermore, we design a new quantization method that quantizes the continuous values to a clipping value instead of ±1 to reduce the inconsistency between continuous features and hash codes. This method can also avoid the vanishing gradient problem during training. Finally, extensive experiments for image hashing retrieval and cross-modal hashing retrieval on public datasets demonstrate that the proposed method can yield compact hash codes and outperforms the state-of-the-art baselines.
Read full abstract