Abstract

Visual modality recently has aroused extensive attention in the fields of knowledge graph and multimedia because a lot of real-world knowledge is multi-modal in nature. However, it is currently unclear to what extent the visual modality can improve the performance of knowledge graph tasks over unimodal models, and equally treating structural and visual features may encode too much irrelevant information from images. In this paper, we probe the utility of the auxiliary visual context from knowledge graph representation learning perspective by designing a Relation Sensitive Multi-modal Embedding model, RSME for short. RSME can automatically encourage or filter the influence of visual context during the representation learning. We also examine the effect of different visual feature encoders. Experimental results validate the superiority of our approach compared to the state-of-the-art methods. On the basis of in-depth analysis, we conclude that under appropriate circumstances models are capable of leveraging the visual input to generate better knowledge graph embeddings and vice versa.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.