Abstract

Cross-modal hashing has attracted a lot of attention and achieved remarkable success in large-scale cross-media similarity retrieval applications because of its superior computational efficiency and low storage overhead. However, constructing similarity relationship among samples in cross-modal unsupervised hashing is challenging because of the lack of manual annotation. Most existing unsupervised methods directly use the representations extracted from the backbone of their respective modality to construct instance similarity matrices, leading to inaccurate similarity matrices and resulting in suboptimal hash codes. To address this issue, a novel unsupervised hashing model, named Structure-aware Contrastive Hashing for Unsupervised Cross-modal Retrieval (SACH), is proposed in this paper. Specifically, we concurrently employ both high-dimensional representations and discriminative representations learned by the network to construct a more informative semantic correlative matrix across modalities. Moreover, we design a multimodal structure-aware alignment network to minimize heterogeneous gap in the high-order semantic space of each modality, effectively reducing disparities within heterogeneous data sources and enhancing the consistency of semantic information across modalities. Extensive experimental results on two widely utilized datasets demonstrate the superiority of our proposed SACH method in cross-modal retrieval tasks over existing state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call