An Entity-centric Approach for Overcoming Knowledge Graph Sparsity

Manjunath Hegde,Partha P Talukdar

doi:10.18653/v1/d15-1061

Abstract

Automatic construction of knowledge graphs (KGs) from unstructured text has received considerable attention in recent research, resulting in the construction of several KGs with millions of entities (nodes) and facts (edges) among them. Unfortunately, such KGs tend to be severely sparse in terms of number of facts known for a given entity, i.e., have low knowledge density. For example, the NELL KG consists of only 1.34 facts per entity. Unfortunately, such low knowledge density makes it challenging to use such KGs in real-world applications. In contrast to best-eort extraction paradigms followed in the construction of such KGs, in this paper we argue in favor of ENTIty Centric Expansion (ENTICE), an entity-centric KG population framework, to alleviate the low knowledge density problem in existing KGs. By using ENTICE, we are able to increase NELL’s knowledge density by a factor of 7.7 at 75.5% accuracy. Additionally, we are also able to extend the ontology discovering new relations and entities.

Highlights

Over the last few years, automatic construction of knowledge graphs (KGs) from webscale text data has received considerable attention, resulting in the construction of several large KGs such as NELL (Mitchell et al, 2015), Google’s Knowledge Vault (Dong et al, 2014)
Our goal here is to draw attention to the effectiveness of entity-centric approaches with bigger scope towards improving knowledge density, and that even relatively straightforward techniques can go a long way in alleviating low knowledge density in existing state-ofthe-art KGs
Integrating with Knowledge Graph: Based on evaluation over a random-sampling, we find that entity linking in ENTIty Centric Expansion (ENTICE) is 92% accurate, while relation linking is about 70% accurate

Summary

Introduction

Over the last few years, automatic construction of knowledge graphs (KGs) from webscale text data has received considerable attention, resulting in the construction of several large KGs such as NELL (Mitchell et al, 2015), Google’s Knowledge Vault (Dong et al, 2014). These KGs consist of millions of entities and facts involving them. Construction of such KGs tend to follow a batch paradigm: the knowledge extraction system makes a full pass over the text corpus extracting whatever knowledge it finds, and aggregating all extractions into a graph Such best-effort extraction paradigm has proved to be inadequate to address the low knowledge density issue mentioned above.

Related Work

Data Preprocessing

Triple Extraction

Noun and Relation Phrase Normalization

Integrating with Knowledge Graph

Experiments

Findings

Conclusion