Abstract
Modeling the less constrained grammar and word order of conversational speech poses a great challenge to conventional back-off n-gram language models (BNLMs). Recurrent Neural Network Language Models (RNNLMs) can provide much better predictions, however, in real-time Automatic Speech Recognition (ASR) systems (e.g. speech dictation) the process delay due to two-pass decoding can not be tolerated. In this paper, we investigate n-gram-based language modeling techniques that can be applied in a single-pass ASR system to approximate the performance of a RNNLM. Perplexity and word error rate (WER) of BNLMs, BNLM approximation of RNNLMs (RNNBNLM) and RNN n-grams are compared on our Hungarian ASR task. Rich morphology of agglutinative languages (like Hungarian) is often handled by using subword language models, hence we evaluated subword BNLMs, RNN-BNLM and RNN ngrams, as well. It was found that a subword RNN-BNLM can approach the performance of a RNN 4-gram model, and recover roughly 40% of the RNNLM perplexity reduction. All in all, we managed to improve WER of our call center speech transcription system 8% relative without affecting its real-time operation.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have