Abstract

We propose a segmental neural language model that combines the generalization power of neural networks with the ability to discover word-like units that are latent in unsegmented character sequences. In contrast to previous segmentation models that treat word segmentation as an isolated task, our model unifies word discovery, learning how words fit together to form sentences, and, by conditioning the model on visual context, how words' meanings ground in representations of non-linguistic modalities. Experiments show that the unconditional model learns predictive distributions better than character LSTM models, discovers words competitively with nonparametric Bayesian word segmentation models, and that modeling language conditional on visual context improves performance on both.

Highlights

  • How infants discover words that make up their first language is a long-standing question in developmental psychology (Saffran et al, 1996)

  • The best language models, measured in terms of their ability to predict language, segment quite poorly (Chung et al, 2017; Wang et al, 2017; Kádár et al, 2018), while the strongest models in terms of word segmentation (Goldwater et al, 2009; Berg-Kirkpatrick et al, 2010) do not adequately account for the long-range dependencies that are manifest in language and that are captured by recurrent neural networks (Mikolov et al, 2010)

  • We argue that such a unified model is preferable to a pipeline model of language acquisition

Read more

Summary

Introduction

How infants discover words that make up their first language is a long-standing question in developmental psychology (Saffran et al, 1996). We introduce a single model that discovers words, learns how they fit together (not just locally, but across a complete sentence), and grounds them in learned representations of naturalistic non-linguistic visual contexts. We argue that such a unified model is preferable to a pipeline model of language acquisition (e.g., a model where words are learned by one character-aware model, and a full-sentence grammar is acquired by a second language model using the words predicted by the first). This assumption has been employed in a number of related models to permit the use of LSTMs to represent rich history while retaining the convenience of dynamic programming inference algorithms (Wang et al, 2017; Ling et al, 2017; Graves, 2012)

Segment generation
Inference
Expected length regularization
Datasets
English
Chinese
Image Caption Dataset
Experiments
Results
Related Work
A Dataset statistics
C SNLM Model Configuration
E Learning
F Evaluation Metrics
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call