Hashing and quantization methods based on deep learning are being widely used in large-scale image retrieval. Most methods use traditional supervised models to train models. Although these methods achieve excellent performance, they rely too much on expensive label information and do not fully utilize data resources. These methods overly encourage binary codes to retain original information, which may lead to models building too much useless information at the expense of discriminative information that is more important for retrieval. To address these issues, we propose a self-supervised method that does not require labeled information, called Contrastive Self-Supervised Weak-Orthogonal Product Quantization (CSWPQ). The method is trained in a label-free self-supervised manner to reduce the need for labels by maximizing the similarity of different views of the same image and minimizing the similarity of different image views, which not only improves the generalization ability of the model but also enhances the robustness. We introduce a quantified contrastive learning method that is different from traditional contrastive learning to learn more discriminative image representations by constructing quantified image features of different views of the image. In addition, in order to reduce the quantization error caused by product quantization, we impose a weak-orthogonal constraint to increase the diversity of representations, reduce the information loss and computational overhead in the quantization process, and improve the computational speed. Extensive experiments on CIFAR-10, NUS-WIDE and FLICKR25K datasets show that our proposed method achieves better performance.
Read full abstract