Abstract

Lattice-free maximum mutual information (LFMMI) was recently proposed as a mixture of the ideas of hidden-Markov-model-based acoustic models (AMs) and connectionist-temporal-classification-based AMs. In this paper, we investigate LFMMI from various perspectives of model combination, teacher-student training, and unsupervised speaker adaptation. Especially, we thoroughly investigate the use of the “sequence-level” Kullback-Leibler divergence with its novel and simple error derivation to enhance LFMMI-based AMs. In our experiment, we used the corpus of spontaneous Japanese (CSJ). Our best AM was an ensemble of three types of time delay neural networks and one long short-term memory-based network, and it finally achieved a WER of 6.94%, which is, to the best of our knowledge, the best published result for the CSJ.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call