Abstract

Open Information Extraction (OpenIE) methods are effective at extracting (noun phrase, relation phrase, noun phrase) triples from text, e.g., (Barack Obama, took birth in, Honolulu). Organization of such triples in the form of a graph with noun phrases (NPs) as nodes and relation phrases (RPs) as edges results in the construction of Open Knowledge Graphs (OpenKGs). In order to use such OpenKGs in downstream tasks, it is often desirable to learn embeddings of the NPs and RPs present in the graph. Even though several Knowledge Graph (KG) embedding methods have been recently proposed, all of those methods have targeted Ontological KGs, as opposed to OpenKGs. Straightforward application of existing Ontological KG embedding methods to OpenKGs is challenging, as unlike Ontological KGs, OpenKGs are not canonicalized, i.e., a real-world entity may be represented using multiple nodes in the OpenKG, with each node corresponding to a different NP referring to the entity. For example, nodes with labels Barack Obama, Obama, and President Obama may refer to the same real-world entity Barack Obama. Even though canonicalization of OpenKGs has received some attention lately, output of such methods has not been used to improve OpenKG embeddings. We fill this gap in the paper and propose Canonicalization-infused Representations (CaRe) for OpenKGs. Through extensive experiments, we observe that CaRe enables existing models to adapt to the challenges in OpenKGs and achieve substantial improvements for the link prediction task. © 2019 Association for Computational Linguistics

Highlights

  • Open Information Extraction (OpenIE) methods such as ReVerb (Fader et al, 2011), OLLIE (Mausam et al, 2012), BONIE (Saha et al, 2017) and CALMIE (Saha and Mausam, 2018) can automatically extract

  • We propose Canonicalization-infused Representations (CaRe) for Open Knowledge Graph (KG) - a novel approach to enrich Open Knowledge Graphs (OpenKGs) embedding models with the output of a canonicalization model

  • Notations: OpenKG is denoted as G = (N, R, T+), where N and R are the set of noun phrases (NPs) and relation phrases (RPs), respectively, and T+ = {(s, r, o)|s ∈ N, r ∈ R, o ∈ N} is the set of observed triples

Read more

Summary

Introduction

Open Information Extraction (OpenIE) methods such as ReVerb (Fader et al, 2011), OLLIE (Mausam et al, 2012), BONIE (Saha et al, 2017) and CALMIE (Saha and Mausam, 2018) can automatically extract Existing KG embedding models train representation of each node and edge label based on the context of triples they are present in Doing this is suitable for ontological KGs as they are canonicalized. The paradigm of learning embeddings for each node and edge label only from the context of the triples they appear in is ineffective for OpenKGs. A possible solution is to canonicalize the OpenKGs. A possible solution is to canonicalize the OpenKGs This involves identifying NPs and RPs that refer to the same entity and relation, and assigning them unique IDs. Nodes in the OpenKG having the same ID are merged, leading to a clean and canonicalized graph. Instead of explicitly merging nodes with common IDs, KG embedding models can be designed to judiciously account for mistakes during the canonicalization step Towards establishing this premise, we propose a flexible OpenKG embedding approach to integrate and utilize the output of a canonicalization model in an error-conscious manner. CaRe source code is available at https:// github.com/malllabiisc/CaRE

Related Work
Background
Overview
Step 1
Step 2
Datasets
Open KG Link Prediction Evaluation
Experimental Setup
Overall Performance
Impact of parameterizing RP embeddings
Different ways to utilize Canonicalization edges
Qualitative Analysis
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.