Abstract

In unsupervised cross-modal hashing, there are two notable issues that require attention. The inter- and intra-modal similarity matrices in the original and Hamming spaces lack sufficient neighborhood information and semantic consistency, while solely relying on the reconstruction of instance-level similarity matrices fails to effectively capture the global intrinsic correlation and manifold structure of the training samples. We propose a novel method that combines multi-similarity reconstructing with clustering-based contrastive hashing. Firstly, we construct image feature, text feature and joint-semantic feature multi-similarity matrices in their original space, along with their corresponding hashing code similarity matrices in the Hamming space, to enhance the semantic consistency of the inter-and intra-modal reconstructions. Secondly, the clustering-based contrastive hashing is proposed to capture the global intrinsic correlation and manifold structure of the image-text pairs. Extensive experiment results on Wiki, NUS-WIDE, MIRFlickr-25K and MS-COCO demonstrate the promising cross-modal retrieval performance of the proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.