Abstract

How can pretrained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present, using synthetic data, the first study that investigates the causal relation between facts present in training and facts learned by the PLM. For reasoning, we show that PLMs seem to learn to apply some symbolic reasoning rules correctly but struggle with others, including two-hop reasoning. Further analysis suggests that even the application of learned reasoning rules is flawed. For memorization, we identify schema conformity (facts systematically supported by other facts) and frequency as key factors for its success.

Highlights

  • Pretrained language models (PLMs) like BERT (Devlin et al, 2019), GPT-2 (Radford et al, 2019) and RoBERTa (Liu et al, 2019) have emerged as universal tools that capture a diverse range of linguistic and – as more and more evidence suggests – factual knowledge (Petroni et al, 2019; Radford et al, 2019).Recent work on knowledge captured by pretrained language models (PLMs) is focused on probing, a methodology that identifies the set of facts a PLM has command of

  • We identify two important factors that lead to successful memorization. (i) Frequency: Other things being equal, low-frequency facts are not learned whereas frequent facts are. (ii) Schema conformity: Facts that conform with the overall schema of their entities (e.g., “sparrows can fly” in a corpus with many similar facts about birds) are easier to memorize than exceptions (e.g., “penguins can dive”)

  • We studied BERT’s ability to capture knowledge from its training corpus by investigating its reasoning and memorization capabilities

Read more

Summary

Introduction

Pretrained language models (PLMs) like BERT (Devlin et al, 2019), GPT-2 (Radford et al, 2019) and RoBERTa (Liu et al, 2019) have emerged as universal tools that capture a diverse range of linguistic and – as more and more evidence suggests – factual knowledge (Petroni et al, 2019; Radford et al, 2019). We pose the following two questions: a) Symbolic reasoning: Are PLMs able to infer knowledge not seen explicitly during pretraining? To test whether BERT has learned a fact, we mask the object, thereby generating a cloze-style query, and evaluate predictions. During the course of pretraining, BERT sees more data than any human could read in a lifetime, an amount of knowledge that surpasses its storage capacity We simulate this with a scaled-down version of BERT and a training set that ensures that BERT cannot memorize all facts in training. Synthetic corpora provide an effective way of investigating reasoning by giving full control over what knowledge is seen and which rules are employed in generating the data. The model’s task is to predict the correct object

Symbolic Reasoning
Memorization
BERT Model
Analysis of SYM and INV
Analysis of NEG
Analysis of COMP
Natural Language Corpora
Limitations
Related Work
Conclusion
Model hyperparameters
Findings
Data hyperparameters
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call