KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

Xiaozhi Wang,Tianyu Gao,Zhaocheng Zhu,Jian Tang,Zhengyan Zhang,Zhiyuan Liu,Juanzi Li

doi:10.1162/tacl_a_00360

Abstract

Abstract Pre-trained language representation models (PLMs) cannot well capture factual knowledge from text. In contrast, knowledge embedding (KE) methods can effectively represent the relational facts in knowledge graphs (KGs) with informative entity embeddings, but conventional KE models cannot take full advantage of the abundant textual information. In this paper, we propose a unified model for Knowledge Embedding and Pre-trained LanguagERepresentation (KEPLER), which can not only better integrate factual knowledge into PLMs but also produce effective text-enhanced KE with the strong PLMs. In KEPLER, we encode textual entity descriptions with a PLM as their embeddings, and then jointly optimize the KE and language modeling objectives. Experimental results show that KEPLER achieves state-of-the-art performances on various NLP tasks, and also works remarkably well as an inductive KE model on KG link prediction. Furthermore, for pre-training and evaluating KEPLER, we construct Wikidata5M1 , a large-scale KG dataset with aligned entity descriptions, and benchmark state-of-the-art KE methods on it. It shall serve as a new KE benchmark and facilitate the research on large KG, inductive KE, and KG with text. The source code can be obtained from https://github.com/THU-KEG/KEPLER.

Highlights

Pre-trained language representation models (PLMs) cannot well capture factual knowledge from text
We propose KEPLER, a unified model for Knowledge Embedding and Pre-trained LanguagE Representation
(3) We introduce Wikidata5M, a new large-scale knowledge graphs (KGs) dataset, which shall promote the research on large-scale KG, inductive knowledge embedding (KE), and the interactions between KG and natural language processing (NLP)

Summary

Introduction

Pre-trained language representation models (PLMs) cannot well capture factual knowledge from text. For pre-training and evaluating KEPLER, we construct Wikidata5M1, a large-scale KG dataset with aligned entity descriptions, and benchmark state-of-the-art KE methods on it. It shall serve as a new KE benchmark and facilitate the research on large KG, inductive KE, and KG with text. For pre-training and evaluating KEPLER, we need a KG with (1) large amounts of knowledge facts, (2) aligned entity descriptions, and (3) reasonable inductive-setting data split, which cannot be satisfied by existing KE benchmarks. Our contribution is three-fold: (1) We propose KEPLER, a knowledge-enhanced PLM by jointly optimizing the KE and MLM objectives, which brings great improvements on a wide range of NLP tasks. (2) By encoding text descriptions as entity embeddings, KEPLER shows its effectiveness as a KE model, especially in the inductive setting. (3) We introduce Wikidata5M, a new large-scale KG dataset, which shall promote the research on large-scale KG, inductive KE, and the interactions between KG and NLP

KEPLER

Encoder

Knowledge Embedding

Masked Language Modeling

Training Objectives

Variants and Implementations

Wikidata5M

Data Collection

Data Split

Benchmark

Experimental Setting

NLP Tasks

KE Tasks

Ablation Study

Knowledge Probing Experiment

Running Time Comparison

Correlation with Entity Frequency

Understanding Text or Storing Knowledge

Related Work

Findings

Conclusion and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Transactions of the Association for Computational Linguistics	Publication Date: Mar 11, 2021
Citations: 278	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics

Lead the way for us

Similar Papers

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation
...
-
, et. al. ...
01 Aug 2021
01 Aug 2021

Entity alignment via knowledge embedding and type matching constraints for knowledge graph inference
Guoming Lu ... Minjie Jin
Journal of Ambient Intelligence and Humanized Computing | VOL. 13
Guoming Lu, et. al.Guoming Lu ... Minjie Jin
14 Feb 2021
Journal of Ambient Intelligence and Humanized Computing | VOL. 13

Improving the Efficiency of Link Prediction on Handling Incomplete Knowledge Graph Using Clustering
Fitri Susanti ... Kridanto Surendro
-
Fitri Susanti, et. al.Fitri Susanti ... Kridanto Surendro
23 Feb 2023
23 Feb 2023

Online Updates of Knowledge Graph Embedding
Luo Fei ... Arijit Khan
-
Luo Fei, et. al.Luo Fei ... Arijit Khan
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Transactions of the Association for Computational Linguistics