Abstract

AbstractUnsupervised hashing for cross-modal retrieval has received much attention in the data mining area. Recent methods rely on image-text paired data to conduct unsupervised cross-modal hashing in batch samples. There are two main limitations for existing models: (1) learning of cross-modal representations is restricted to batches; (2) semantically similar samples may be wrongly treated as negative. In this paper, we propose a novel category-level contrastive learning for unsupervised cross-modal hashing, which alleviates the above problems and improves cross-modal query accuracy. To break the limitation of learning in small batches, a selected memory module is first proposed to take global relations into account. Then, we obtain pseudo labels through clustering and combine the labels with the Hadamard Matrix for category-centered learning. To reduce wrong negatives, we further propose a memory bank to store clusters of samples and construct negatives by selecting samples from different categories for contrastive learning. Extensive experiments show the significant superiority of our approach over the state-of-the-art models on MIRFLICKR-25K and NUS-WIDE datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call