Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence

Naoyuki Kanda,Yusuke Fujita,Kenji Nagamatsu

doi:10.1109/asru.2017.8268918

Abstract

Lattice-free maximum mutual information (LFMMI) was recently proposed as a mixture of the ideas of hidden-Markov-model-based acoustic models (AMs) and connectionist-temporal-classification-based AMs. In this paper, we investigate LFMMI from various perspectives of model combination, teacher-student training, and unsupervised speaker adaptation. Especially, we thoroughly investigate the use of the “sequence-level” Kullback-Leibler divergence with its novel and simple error derivation to enhance LFMMI-based AMs. In our experiment, we used the corpus of spontaneous Japanese (CSJ). Our best AM was an ensemble of three types of time delay neural networks and one long short-term memory-based network, and it finally achieved a WER of 6.94%, which is, to the best of our knowledge, the best published result for the CSJ.

Full Text