Commonsense Knowledge Mining from Pretrained Models

Joe Davison,Joshua Feldman,Alexander Rush

doi:10.18653/v1/d19-1109

Abstract

Inferring commonsense knowledge is a key challenge in machine learning. Due to the sparsity of training data, previous work has shown that supervised methods for commonsense knowledge mining underperform when evaluated on novel data. In this work, we develop a method for generating commonsense knowledge using a large, pre-trained bidirectional language model. By transforming relational triples into masked sentences, we can use this model to rank a triple’s validity by the estimated pointwise mutual information between the two entities. Since we do not update the weights of the bidirectional model, our approach is not biased by the coverage of any one commonsense knowledge base. Though we do worse on a held-out test set than models explicitly trained on a corresponding training set, our approach outperforms these methods when mining commonsense knowledge from new sources, suggesting that our unsupervised technique generalizes better than current supervised approaches.

Highlights

Commonsense knowledge consists of facts about the world which are assumed to be widely known
Commonsense knowledge base completion (CKBC) is a machine learning task motivated by the need to improve the coverage of these resources
We propose using the estimated point-wise mutual information (PMI) of the head h and tail t of a triple conditioned on the relation r, defined as, PMI(t, h|r) = log p(t|h, r) − log p(t|r)

Summary

Introduction

Commonsense knowledge consists of facts about the world which are assumed to be widely known. Jastrzebski et al (2018) demonstrated that much of the data in the ConceptNet test set were rephrased relations from the training set, and that this train-test set leakage led to artificially inflated test performance metrics This problem of traintest leakage is typical in knowledge base completion tasks (Toutanova et al, 2015; Dettmers et al, 2018). We use a bidirectional masked model which provides a more flexible framework for likelihood estimation and allows us to estimate point-wise mutual information It is beyond the scope of this paper, it would be interesting to adapt the methods presented here for the related task of generating new commonsense knowledge (Saito et al, 2018)

Method

Generating Sentences from Triples

Scoring Generated Triples

Experiments

Findings

Conclusion