Grammatical Inference of PCFGs Applied to Language Modelling and Unsupervised Parsing

James Scicluna,Rémi Eyraud,Makoto Kanazawa,Colin De La Higuera,Ryo Yoshinaka

doi:10.3233/fi-2016-1392

Abstract

Recently, different theoretical learning results have been found for a variety of context-free grammar subclasses through the use of distributional learning [1]. However, these results are still not extended to probabilistic grammars. In this work, we give a practical algorithm, with some proven pr operties, that learns a subclass of probabilistic grammars from positive data. A minimum satisfiability solver is used to direct the search towards small grammars. Experiments on well-known context-free languages and artificial natural language grammars give positive results. Moreover, our analysis shows that the type of grammars induced by our algorithm are, in theory, capable of modelling context-free features of natural language syntax. One of our experiments shows that our algorithm can potentially outperform the state-of-the-art in unsupervised parsing on the WSJ10 corpus.

Full Text