Abstract

Adaptor grammars are a flexible, powerful formalism for defining nonparametric, unsupervised models of grammar productions. This flexibility comes at the cost of expensive inference. We address the difficulty of inference through an online algorithm which uses a hybrid of Markov chain Monte Carlo and variational inference. We show that this inference strategy improves scalability without sacrificing performance on unsupervised word segmentation and topic modeling tasks.

Highlights

  • Nonparametric Bayesian models are effective tools to discover latent structure in data (Muller and Quintana, 2004)

  • We evaluate our online adaptor grammar on the task of word segmentation, which focuses on identifying word boundaries from a sequence of characters

  • Zhai and Boyd-Graber (2013) introduce an inference framework, INFVOC, to discover words from a Dirichlet process with a character n-gram base distribution. We show that their complicated model and online inference can be captured and extended via an appropriate Probabilistic Context-free GrammarsProbabilistic context-free grammars (PCFG) grammar and our online adaptor grammar inference algorithm

Read more

Summary

Introduction

Nonparametric Bayesian models are effective tools to discover latent structure in data (Muller and Quintana, 2004). These models have had great success in text analysis, especially syntax (Shindo et al, 2012). We focus on adaptor grammars (Johnson et al, 2006), syntactic nonparametric models based on probabilistic context-free grammars. Adaptor grammars weaken the strong statistical independence assumptions PCFGs make (Section 2). The weaker statistical independence assumptions that adaptor grammars make come at the cost of expensive inference. A sequence of terminals (the yield) is generated by recursively rewriting nonterminals as sequences of child symbols (either a nonterminal or a symbol) This builds a hierarchical phrase-tree structure for every yield

Objectives
Findings
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.