Abstract

Cross-modal hashing maps heterogeneous multimedia data into Hamming space for retrieving relevant samples across modalities, which has received great research interests due to its rapid retrieval and low storage cost. In real-world applications, due to high manual annotation cost of multi-media data, we can only make use of limited number of labeled data with rich unlabeled data. In recent years, several semi-supervised cross-modal hashing (SCH) methods have been presented. However, how to fully explore and jointly utilize the modality-specific (complementarity) and modality-shared (correlation) information for retrieval has not been well studied for existing SCH works. In this paper, we propose a novel SCH approach named Modality-specific and Cross-modal Graph Convolutional Networks (MCGCN). The network architecture contains two modality-specific channels and a cross-modal channel to learn modality-specific and shared representations for each modality, respectively. Graph convolutional network (GCN) is leveraged in these three channels to explore intra-modal and inter-modal similarity, and perform semantic information propagation from labeled data to unlabeled data. Modality-specific and shared representations for each modality are fused with attention scheme. To further reduce the modality gap, a discriminative model is designed, learning to classify the modality of representations, and network training is guided by adversarial scheme. Experiments on two widely used multi-modal datasets demonstrate MCGCN outperforms state-of-the-art semi-supervised/supervised cross-modal hashing methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call