Abstract
We introduce a novel nonparametric Bayesian model for the induction of Combinatory Categorial Grammars from POS-tagged text. It achieves state of the art performance on a number of languages, and induces linguistically plausible lexicons.
Highlights
What grammatical representation is appropriate for unsupervised grammar induction? Initial attempts with context-free grammars (CFGs) were not very successful (Carroll and Charniak, 1992; Charniak, 1993)
Dependency grammars make it difficult to capture non-local structures, and Blunsom and Cohn (2010) show that it may be advantageous to reformulate the underlying dependency grammar in terms of a tree-substitution grammar (TSG) which pairs words with treelets that specify the number of left and right dependents they have. We explore yet another option: instead of dependency grammars, we use Combinatory Categorial Grammar (CCG, Steedman (1996; 2000)), a linguistically expressive formalism that pairs lexical items with rich categories that capture all language-specific information
7.1 PASCAL Challenge on Grammar Induction In Table 1, we compare the performance of the basic Argument model (MLE), of our Hierarchical Dirichlet Processes (HDP) model with four different settings of the hyperparameters and of the systems presented in the PASCAL Challenge on Grammar Induction (Gelling et al, 2012)
Summary
What grammatical representation is appropriate for unsupervised grammar induction? Initial attempts with context-free grammars (CFGs) were not very successful (Carroll and Charniak, 1992; Charniak, 1993). We explore yet another option: instead of dependency grammars, we use Combinatory Categorial Grammar (CCG, Steedman (1996; 2000)), a linguistically expressive formalism that pairs lexical items with rich categories that capture all language-specific information. This may seem a puzzling choice, since CCG requires a significantly larger inventory of categories than is commonly assumed for CFGs. unlike CFG nonterminals, CCG categories are not arbitrary symbols: they encode, and are determined by, the basic word order of the language and the number of arguments each word takes.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Transactions of the Association for Computational Linguistics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.