Empirically combining unnormalized NNLM and back-off N-gram for fast N-best rescoring in speech recognition

Yongzhe Shi,Jia Liu,Meng Cai,Wei-Qiang Zhang

doi:10.1186/1687-4722-2014-19

Abstract

Neural network language models (NNLM) have been proved to be quite powerful for sequence modeling, including feed-forward NNLM (FNNLM), recurrent NNLM (RNNLM), etc. One main issue concerned for NNLM is the heavy computational burden of the output layer, where the output needs to be probabilistically normalized and the normalizing factors require lots of computation. How to fast rescore the N-best list or lattice with NNLM attracts much attention for large-scale applications. In this paper, the statistic characteristics of normalizing factors are investigated on the N-best list. Based on the statistic observations, we propose to approximate the normalizing factors for each hypothesis as a constant proportional to the number of words in the hypothesis. Then, the unnormalized NNLM is investigated and combined with back-off N-gram for fast rescoring, which can be computed very fast without the normalization in the output layer, with the complexity reduced significantly. We apply our proposed method to a well-tuned context-dependent deep neural network hidden Markov model (CD-DNN-HMM) speech recognition system on the English-Switchboard phone-call speech-to-text task, where both FNNLM and RNNLM are trained to demonstrate our method. Experimental results show that unnormalized probability of NNLM is quite complementary to that of back-off N-gram, and combining the unnormalized NNLM and back-off N-gram can further reduce the word error rate with little computational consideration.

Highlights

The output of the speech-to-text (STT) system is usually a multi-candidate form encoded as lattice or N-best list
It is worthy to notice that the fast-UP-forward Neural network language models (NNLM) (FNNLM) is more than 25 times faster than FNNLM + class layer and more than 1,100 times faster than FNNLM
The language scores of back-off 3-gram based on KN smoothing (KN3) is usually available in the lattice or N-best list, so that the UP-recurrent NNLM (RNNLM) combined with the KN3 reduces word error rate (WER) by 0.8% and 1.2% absolute on Hub5’00SWB and RT03S-FSH sets, respectively

Summary

Introduction

The output of the speech-to-text (STT) system is usually a multi-candidate form encoded as lattice or N-best list. We apply our proposed method to a well-tuned context-dependent deep neural network hidden Markov model (CD-DNN-HMM) speech recognition system on the English-Switchboard speech-to-text task. Both feedforward NNLM and recurrent NNLM are well-trained to verify the effectiveness of our method. As our method is theoretically founded on the statistic observations, we first introduce the experimental setup, including the speech recognizer, N-best hypotheses, NNLM structure, and NNLM training, in Section 2 for convenience. 2.1 Speech recognizer and N-best hypotheses The effectiveness of our proposed method is evaluated on the STT task with the 309-hour Switchboard-I training set [15]. Top 100-best hypotheses are rescored and reranked by other language models, such as back-off 5-gram, FNNLM, and RNNLM, to improve the performance

Structure and training of NNLM

Statistics of normalizing factors on N-best hypotheses

Normalizing factor for one word

Normalizing factor for one hypothesis

Combining unnormalized NNLM and back-off N-gram

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Eurasip Journal on Audio, Speech, and Music Processing	Publication Date: Apr 28, 2014
Citations: 12	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Empirically combining unnormalized NNLM and back-off N-gram for fast N-best rescoring in speech recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Eurasip Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Joint unsupervised adaptation of n-gram and RNN language models via LDA-based hybrid mixture modeling
Ryo Masumura ... Yushi Aono
-
Ryo Masumura, et. al.Ryo Masumura ... Yushi Aono
01 Dec 2017
01 Dec 2017

Learning from past mistakes: improving automatic speech recognition output via noisy-clean phrase context modeling
Haoqi Li ... Prashanth Gurunath Shivakumar
APSIPA Transactions on Signal and Information Processing | VOL. 8
Haoqi Li, et. al.Haoqi Li ... Prashanth Gurunath Shivakumar
01 Jan 2019
APSIPA Transactions on Signal and Information Processing | VOL. 8

Investigation of back-off based interpolation between recurrent neural network and n-gram language models
X Chen ... X Liu
-
X Chen, et. al.X Chen ... X Liu
01 Dec 2015
01 Dec 2015

An empirical study of statistical language models: n-gram language models vs. neural network language models
Freha Mezzoudj ... Abdelkader Benyettou
International Journal of Innovative Computing and Applications | VOL. 9
Freha Mezzoudj, et. al.Freha Mezzoudj ... Abdelkader Benyettou
01 Jan 2018
International Journal of Innovative Computing and Applications | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Empirically combining unnormalized NNLM and back-off N-gram for fast N-best rescoring in speech recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Eurasip Journal on Audio, Speech, and Music Processing