Abstract

The recurrent neural network language model (RNNLM) has shown significant promise for statistical language modeling. In this work, a new class-based output layer method is introduced to further improve the RNNLM. In this method, word class information is incorporated into the output layer by utilizing the Brown clustering algorithm to estimate a class-based language model. Experimental results show that the new output layer with word clustering not only improves the convergence obviously but also reduces the perplexity and word error rate in large vocabulary continuous speech recognition.

Highlights

  • Statistical language models estimate the probability of a word occurring in a given context, which plays an important role in many natural language processing applications such as speech recognition, machine translation, and information retrieval

  • The first two columns refer to the baseline (RNNLM-Freq-based word clustering (Freq)) and the interpolated model with Standard back-off n-gram language model (LM)-KN5 (RNNLM-Freq + KN5), respectively, which is consistent with the results reported in [4]

  • We can see that the word error rate (WER) for evaluation set is consistently reduced by 0.3% to 0.7% absolutely, compared with that of recurrent neural network language model (RNNLM)-Freq, and 1.4% to 1.7% compared with the 1-best hypothesis

Read more

Summary

Introduction

Statistical language models estimate the probability of a word occurring in a given context, which plays an important role in many natural language processing applications such as speech recognition, machine translation, and information retrieval. It is a reasonable assumption that similar words occur in the same context with similar probability, for example, ‘America,’ ‘China,’ and ‘Japan’ which usually come after the same preposition or as the subject of a sentence Based on this assumption, neural network language models (NNLMs) [1,2] project the discrete word. Other tree-structured output layer methods have been proposed to speed up the NNLMs [7,8]. We introduce a new method for constructing a class-based output layer using the Brown clustering algorithm. Words are roughly clustered according to their frequencies in this method, with training speed increasing but performance degraded. This paper is organized as follows: In Section 2, we introduce our baseline RNNLM and the proposed Brown clustering method for constructing the output layer.

Model description
Word clustering for output layer
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.