Cross-modal hashing aims at mapping data points from varied modalities to a shared Hamming space for providing swift and adaptive search services. The recent progress in contrastive learning has significantly enhanced the retrieval performance of cross-modal hashing, owing to its strong ability in powerful representations learning. However, most of them face two primary challenges: (1) they select negative samples for anchors randomly in each mini-batch, which may bring false negative samples that destroy the intrinsic semantic similarity in the learned hash codes; (2) the continuous relaxation strategy is commonly used to generate discrete hash codes, which results in quantization errors and consequently impairing the quality of hash codes. To alleviate these issues, we propose a deep supervised cross-modal hashing scheme, termed Supervised Contrastive Discrete Hashing (SCDH). Specifically, we extend the unsupervised contrastive method to the supervised setting to construct the positive and negative data points with class labels, avoiding the performance decreasing caused by false negative data. Besides, multiple positive and negative samples for each anchor can be chosen from the generated global dictionary bank by class labels, which can better capture the global data structure. Moreover, discrete hash codes can be generated directly, enhancing the feature representational capacity of hash codes. Comprehensive experiments conducted on two public datasets affirm the superior performance of the proposed SCDH over several other cutting-edge methods.
Read full abstract