Abstract

This paper proposes a method to train Weighted Finite State Transducer (WFST) based structural classifiers using deep neural network (DNN) acoustic features and recurrent neural network (RNN) language features for speech recognition. Structural classification is an effective approach to achieve highly accurate recognition of structured data in which the classifier is optimized to maximize the discriminative performance using different kinds of features. A WFSTbased classifier, which can integrate acoustic, pronunciation, and language features embedded in a composed WFST, was recently extended to incorporate DNN bottleneck (DNNBN) features. In this paper, we further investigate the integration of a RNN language model (RNNLM) with the WFST classifier. To this end, we introduce a lattice rescoring method using a RNNLM for efficient classifier training. In a lecture transcription task, we reduced the word error rate from 19.2% to 18.6% by optimizing the WFST parameters for the DNNBN acoustic and RNNLM language features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call