Abstract
The recurrent neural network language model (RNNLM) has shown significant promise for statistical language modeling. In this work, a new class-based output layer method is introduced to further improve the RNNLM. In this method, word class information is incorporated into the output layer by utilizing the Brown clustering algorithm to estimate a class-based language model. Experimental results show that the new output layer with word clustering not only improves the convergence obviously but also reduces the perplexity and word error rate in large vocabulary continuous speech recognition.
Highlights
Statistical language models estimate the probability of a word occurring in a given context, which plays an important role in many natural language processing applications such as speech recognition, machine translation, and information retrieval
The first two columns refer to the baseline (RNNLM-Freq-based word clustering (Freq)) and the interpolated model with Standard back-off n-gram language model (LM)-KN5 (RNNLM-Freq + KN5), respectively, which is consistent with the results reported in [4]
We can see that the word error rate (WER) for evaluation set is consistently reduced by 0.3% to 0.7% absolutely, compared with that of recurrent neural network language model (RNNLM)-Freq, and 1.4% to 1.7% compared with the 1-best hypothesis
Summary
Statistical language models estimate the probability of a word occurring in a given context, which plays an important role in many natural language processing applications such as speech recognition, machine translation, and information retrieval. It is a reasonable assumption that similar words occur in the same context with similar probability, for example, ‘America,’ ‘China,’ and ‘Japan’ which usually come after the same preposition or as the subject of a sentence Based on this assumption, neural network language models (NNLMs) [1,2] project the discrete word. Other tree-structured output layer methods have been proposed to speed up the NNLMs [7,8]. We introduce a new method for constructing a class-based output layer using the Brown clustering algorithm. Words are roughly clustered according to their frequencies in this method, with training speed increasing but performance degraded. This paper is organized as follows: In Section 2, we introduce our baseline RNNLM and the proposed Brown clustering method for constructing the output layer.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: EURASIP Journal on Audio, Speech, and Music Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.