Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search

Meiyu Liang,Junping Du,Zhengyang Liang,Yongwang Xing,Wei Huang,Zhe Xue

doi:10.1609/aaai.v38i12.29280

Abstract

Deep cross-modal hashing technology provides an effective and efficient cross-modal unified representation learning solution for cross-modal search. However, the existing methods neglect the implicit fine-grained multimodal knowledge relations between these modalities such as when the image contains information that is not directly described in the text. To tackle this problem, we propose a novel self-supervised multi-grained multi-modal knowledge graph contrastive hashing method for cross-modal search (CMGCH). Firstly, in order to capture implicit fine-grained cross-modal semantic associations, a multi-modal knowledge graph is constructed, which represents the implicit multimodal knowledge relations between the image and text as inter-modal and intra-modal semantic associations. Secondly, a cross-modal graph contrastive attention network is proposed to reason on the multi-modal knowledge graph to sufficiently learn the implicit fine-grained inter-modal and intra-modal knowledge relations. Thirdly, a cross-modal multi-granularity contrastive embedding learning mechanism is proposed, which fuses the global coarse-grained and local fine-grained embeddings by multihead attention mechanism for inter-modal and intra-modal contrastive learning, so as to enhance the cross-modal unified representations with stronger discriminativeness and semantic consistency preserving power. With the joint training of intra-modal and inter-modal contrast, the invariant and modal-specific information of different modalities can be maintained in the final unified cross-modal unified hash space. Extensive experiments on several cross-modal benchmark datasets demonstrate that the proposed CMGCH outperforms the state-of the-art methods.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

MKVSE: Multimodal Knowledge Enhanced Visual-semantic Embedding for Image-text Retrieval
Duoduo Feng ... Yuxin Peng
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 19
Duoduo Feng, et. al.Duoduo Feng ... Yuxin Peng
30 Sep 2022
ACM Transactions on Multimedia Computing, Communications, and Applications | VOL. 19

MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
Yang Ding ... Mingxin Cui
-
Yang Ding, et. al.Yang Ding ... Mingxin Cui
01 Jun 2022
01 Jun 2022

MMEA: Entity Alignment for Multi-modal Knowledge Graph
Liyi Chen ... Zhi Li
-
Liyi Chen, et. al.Liyi Chen ... Zhi Li
01 Jan 2020
01 Jan 2020

Patterns of Implicit Learning Below the Level of Conscious Knowledge
Juliana Yordanova ... Rolf Verleger
Journal of Psychophysiology | VOL. 24
Juliana Yordanova, et. al.Juliana Yordanova ... Rolf Verleger
01 Jan 2009
Journal of Psychophysiology | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence