Contrastive Multi-Modal Knowledge Graph Representation Learning

Quan Fang,Jun Hu,Changsheng Xu,Xian Wu,Xiaowei Zhang

doi:10.1109/tkde.2022.3220625

Abstract

Representation learning of knowledge graphs (KGs) aims to embed both entities and relations as vectors in a continuous low-dimensional space, which has facilitated various applications such as link prediction and entity retrieval. Most existing KG embedding methods focus on modeling the structured fact triples independently and ignore the multi-type relations among triples as well as the variety of data types (e.g., texts and images) associated with entities in KGs, and thus fail to capture the complex and multi-modal information that is inherently inside the entity-relation triples. In this paper, we propose a novel approach for knowledge graph embedding named Contrastive Multi-modal Graph Neural Network (CMGNN), which can encapsulate comprehensive features from multi-modal content descriptions of entities and high-order connectivity structures. Specifically, CMGNN first learns entity embeddings from multi-modal content and then contrasts encodings from multi-relational local neighbors and high-order connectivities to obtain latent representations of entities and relations simultaneously. Experimental results demonstrate that CMGNN can effectively model the multi-modalities and multi-type structures in KGs, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the tasks of link prediction and entity classification.

Full Text