Consistent Unsupervised Estimators for Anchored PCFGs

Alexander Clark,Nathanaël Fijalkow

doi:10.1162/tacl_a_00323

Abstract

AbstractLearning probabilistic context-free grammars (PCFGs) from strings is a classic problem in computational linguistics since Horning (1969). Here we present an algorithm based on distributional learning that is a consistent estimator for a large class of PCFGs that satisfy certain natural conditions including being anchored (Stratos et al., 2016). We proceed via a reparameterization of (top–down) PCFGs that we call a bottom–up weighted context-free grammar. We show that if the grammar is anchored and satisfies additional restrictions on its ambiguity, then the parameters can be directly related to distributional properties of the anchoring strings; we show the asymptotic correctness of a naive estimator and present some simulations using synthetic data that show that algorithms based on this approach have good finite sample behavior.

Highlights

This paper presents an approach for strongly learning a linguistically interesting subclass of probabilistic context-free grammars (PCFGs) from strings in the realizable case
We assume that we have some PCFG that we are interested in learning and that we have access only to a sample of strings generated by the PCFG
Consider for example the distribution that generates a single string of length 3 with probability one and the various PCFGs that give rise to that same distribution; for these obvious reasons, that we discuss in more detail later, we cannot have an algorithm that does this for all PCFGs

Summary

Introduction

This paper presents an approach for strongly learning a linguistically interesting subclass of probabilistic context-free grammars (PCFGs) from strings in the realizable case. We assume that we have some PCFG that we are interested in learning and that we have access only to a sample of strings generated by the PCFG (i.e., sampled from the distribution defined by the context-free grammar). Consider for example the distribution that generates a single string of length 3 with probability one and the various PCFGs that give rise to that same distribution; for these obvious reasons, that we discuss in more detail later, we cannot have an algorithm that does this for all PCFGs. we define some sufficient conditions on PCFGs for this algorithm to perform correctly. We define some simple structural conditions on the underlying CFGs (in Section 3), and we will show that the resulting class of PCFGs is identifiable from strings, in the sense that any two PCFGs that define the same distribution over strings will be isomorphic

Methods

Results

Conclusion