Abstract

It is well known that deep learning-based speech recognition improves performance significantly. In deep learning based systems, the deep neural network hidden Markov model (DNN-HMM) is used as an acoustic model (AM). Recently, speaker adaptation techniques based on DNN-HMM have also been investigated. The aim of this work is to improve the performance of unsupervised batch adaptation using DNN-HMM. The proposed adaptation method is based on the cross-adaptation approach, where complementary information derived from several systems is used. Gaussian mixture model HMM (GMM-HMM), DNN-HMM, and language model (LM) adaptation processes are conducted sequentially in the cross-adaptation procedure. The proposed adaptation method was evaluated on a Japanese lecture speech recognition task, reducing the error rate by 13.5% compared to the baseline DNN-HMM-based large vocabulary continuous speech recognition system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call