Abstract

In this paper, we present a survey on the application of recurrent neural networks to the task of statistical language modeling. Although it has been shown that these models obtain good performance on this task, often superior to other state-of-the-art techniques, they suffer from some important drawbacks, including a very long training time and limitations on the number of context words that can be taken into account in practice. Recent extensions to recurrent neural network models have been developed in an attempt to address these drawbacks. This paper gives an overview of the most important extensions. Each technique is described and its performance on statistical language modeling, as described in the existing literature, is discussed. Our structured overview makes it possible to detect the most promising techniques in the field of recurrent neural networks, applied to language modeling, but it also highlights the techniques for which further research is required.

Highlights

  • IntroductionStatistical language modeling Statistical language modeling (SLM) amounts to estimating the probability distribution of various linguistic units, such as words, sentences, and whole documents (Rosenfeld, 2000)

  • Long-span language models that capture long-distance dependencies are expected to be more powerful than low-order language models, but they bring the additional challenge of overcoming the computational complexity when decoding, that is, finding the most probable sequence of words given the trained language model for a specific discourse context in tasks such as automatic speech recognition (ASR), machine translation (MT), or optical character recognition (OCR)

  • In the same study several basic recurrent neural networks (RNN), with different random initializations, were combined into an ensemble and it was found that this ensemble performed significantly better than the within and across sentence boundary language model (LM)

Read more

Summary

Introduction

Statistical language modeling Statistical language modeling (SLM) amounts to estimating the probability distribution of various linguistic units, such as words, sentences, and whole documents (Rosenfeld, 2000). A traditional task in SLM is to model the probability that a given word appears after a given sequence of words. N-gram models were among the earliest techniques to model the probability of observing a given word after some previous words (Bahl et al, 1983; Jelinek, 1998; Church, 1988). N - 1 is the context length, that is the number of words that the model takes into account to estimate the probability of the word. The probability distributions are smoothed by assigning non-zero probabilities to words that are not present in the training data. One reason for smoothing is to compensate the very small fraction of all proper names that are mentioned in any given training data set (Kneser and Ney, 1995; Chelba et al, 2010; Moore, 2009)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call