Knowledge Graph and Text Jointly Embedding

Zhen Wang,Zheng Chen,Jianwen Zhang,Jianlin Feng

doi:10.3115/v1/d14-1167

Abstract

We examine the embedding approach to reason new relational facts from a largescale knowledge graph and a text corpus. We propose a novel method of jointly embedding entities and words into the same continuous vector space. The embedding process attempts to preserve the relations between entities in the knowledge graph and the concurrences of words in the text corpus. Entity names and Wikipedia anchors are utilized to align the embeddings of entities and words in the same space. Large scale experiments on Freebase and a Wikipedia/NY Times corpus show that jointly embedding brings promising improvement in the accuracy of predicting facts, compared to separately embedding knowledge graphs and text. Particularly, jointly embedding enables the prediction of facts containing entities out of the knowledge graph, which cannot be handled by previous embedding methods. At the same time, concerning the quality of the word embeddings, experiments on the analogical reasoning task show that jointly embedding is comparable to or slightly better than word2vec (Skip-Gram).

Highlights

Knowledge graphs such as Freebase (Bollacker et al, 2008) and WordNet (Miller, 1995) have become important resources for many AI & NLP applications such as Q & A
A knowledge graph is a collection of relational facts that are often represented in the form of a triplet, e.g., “(Obama, Born-in, Honolulu)”
Targeting knowledge graph completion, a promising paradigm of embedding was proposed, which is able to reason new facts only from the knowledge graph (Bordes et al, 2011; Bordes et al, 2013; Socher et al, 2013; Wang et al, 2014). In this series of methods, each entity is represented as a k-dimensional vector and each relation is characterized by an operation in k so that a candidate fact can be asserted by simple vector operations

Summary

Introduction

Knowledge graphs such as Freebase (Bollacker et al, 2008) and WordNet (Miller, 1995) have become important resources for many AI & NLP applications such as Q & A. Targeting knowledge graph completion, a promising paradigm of embedding was proposed, which is able to reason new facts only from the knowledge graph (Bordes et al, 2011; Bordes et al, 2013; Socher et al, 2013; Wang et al, 2014) In this series of methods, each entity is represented as a k-dimensional vector and each relation is characterized by an operation in k so that a candidate fact can be asserted by simple vector operations. The vector of an entity may encode global information from the entire graph, and scoring a candidate fact by designed vector operations plays a similar role to long range “reasoning” in the graph Since this requires the vectors of both entities to score a candidate fact, this type of methods can only complete missing facts for which both entities exist in the knowledge graph. How to deal with these facts is a significant obstacle to widely applying the embedding paradigm

Methods

Results

Conclusion