MLSFF: Multi-level structural features fusion for multi-modal knowledge graph completion.

Hanming Zhai,Xiaojun Lv,Xin Tong,Zhiwen Hou,Fanliang Bu

doi:10.3934/mbe.2023630

Abstract

With the rise of multi-modal methods, multi-modal knowledge graphs have become a better choice for storing human knowledge. However, knowledge graphs often suffer from the problem of incompleteness due to the infinite and constantly updating nature of knowledge, and thus the task of knowledge graph completion has been proposed. Existing multi-modal knowledge graph completion methods mostly rely on either embedding-based representations or graph neural networks, and there is still room for improvement in terms of interpretability and the ability to handle multi-hop tasks. Therefore, we propose a new method for multi-modal knowledge graph completion. Our method aims to learn multi-level graph structural features to fully explore hidden relationships within the knowledge graph and to improve reasoning accuracy. Specifically, we first use a Transformer architecture to separately learn about data representations for both the image and text modalities. Then, with the help of multimodal gating units, we filter out irrelevant information and perform feature fusion to obtain a unified encoding of knowledge representations. Furthermore, we extract multi-level path features using a width-adjustable sliding window and learn about structural feature information in the knowledge graph using graph convolutional operations. Finally, we use a scoring function to evaluate the probability of the truthfulness of encoded triplets and to complete the prediction task. To demonstrate the effectiveness of the model, we conduct experiments on two publicly available datasets, FB15K-237-IMG and WN18-IMG, and achieve improvements of 1.8 and 0.7%, respectively, in the Hits@1 metric.

Full Text