Are Pretrained Language Models Symbolic Reasoners over Knowledge?

Nora Kassner,Benno Krojer,Hinrich Schütze

doi:10.18653/v1/2020.conll-1.45

Abstract

How can pretrained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present, using synthetic data, the first study that investigates the causal relation between facts present in training and facts learned by the PLM. For reasoning, we show that PLMs seem to learn to apply some symbolic reasoning rules correctly but struggle with others, including two-hop reasoning. Further analysis suggests that even the application of learned reasoning rules is flawed. For memorization, we identify schema conformity (facts systematically supported by other facts) and frequency as key factors for its success.

Highlights

Pretrained language models (PLMs) like BERT (Devlin et al, 2019), GPT-2 (Radford et al, 2019) and RoBERTa (Liu et al, 2019) have emerged as universal tools that capture a diverse range of linguistic and – as more and more evidence suggests – factual knowledge (Petroni et al, 2019; Radford et al, 2019).Recent work on knowledge captured by pretrained language models (PLMs) is focused on probing, a methodology that identifies the set of facts a PLM has command of
We identify two important factors that lead to successful memorization. (i) Frequency: Other things being equal, low-frequency facts are not learned whereas frequent facts are. (ii) Schema conformity: Facts that conform with the overall schema of their entities (e.g., “sparrows can fly” in a corpus with many similar facts about birds) are easier to memorize than exceptions (e.g., “penguins can dive”)
We studied BERT’s ability to capture knowledge from its training corpus by investigating its reasoning and memorization capabilities

Summary

Introduction

Pretrained language models (PLMs) like BERT (Devlin et al, 2019), GPT-2 (Radford et al, 2019) and RoBERTa (Liu et al, 2019) have emerged as universal tools that capture a diverse range of linguistic and – as more and more evidence suggests – factual knowledge (Petroni et al, 2019; Radford et al, 2019). We pose the following two questions: a) Symbolic reasoning: Are PLMs able to infer knowledge not seen explicitly during pretraining? To test whether BERT has learned a fact, we mask the object, thereby generating a cloze-style query, and evaluate predictions. During the course of pretraining, BERT sees more data than any human could read in a lifetime, an amount of knowledge that surpasses its storage capacity We simulate this with a scaled-down version of BERT and a training set that ensures that BERT cannot memorize all facts in training. Synthetic corpora provide an effective way of investigating reasoning by giving full control over what knowledge is seen and which rules are employed in generating the data. The model’s task is to predict the correct object

Symbolic Reasoning

Memorization

BERT Model

Analysis of SYM and INV

Analysis of NEG

Analysis of COMP

Natural Language Corpora

Limitations

Related Work

Conclusion

Model hyperparameters

Findings

Data hyperparameters

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Are Pretrained Language Models Symbolic Reasoners over Knowledge?

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2020
Citations: 21	License type: cc-by

Similar Papers

Neural Transfer Learning For Vietnamese Sentiment Analysis Using Pre-trained Contextual Language Models
An Pha Le ... Thanh-Van Le
-
An Pha Le, et. al.An Pha Le ... Thanh-Van Le
16 Dec 2021
16 Dec 2021

On the Power of Pre-Trained Text Representations
Yu Meng ... Jiaxin Huang
-
Yu Meng, et. al.Yu Meng ... Jiaxin Huang
14 Aug 2021
14 Aug 2021

Towards an Enhanced Understanding of Bias in Pre-trained Neural Language Models: A Survey with Special Emphasis on Affective Bias
Anoop K ... Manjary P Gangan
-
Anoop K, et. al. Anoop K ... Manjary P Gangan
01 Jan 2021
01 Jan 2021

A Multi-tasking and Multi-stage Chinese Minority Pre-trained Language Model
Bin Li ... Shutao Li
-
Bin Li, et. al.Bin Li ... Shutao Li
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Are Pretrained Language Models Symbolic Reasoners over Knowledge?

Abstract

Highlights

Summary

Talk to us

Similar Papers