Abstract

It is very important to exploit abundant unlabeled speech for improving the acoustic model training in automatic speech recognition (ASR). Semi-supervised training methods incorporate unlabeled data in addition to labeled data to enhance the model training, but it encounters the error-prone label problem. The ensemble training scheme trains a set of models and combines them to make the model more general and robust, but it has not been applied to the unlabeled data. In this work, we propose an effective semi-supervised training of deep neural network (DNN) acoustic models by incorporating the diversity among the ensemble of models. The resultant model improved the performance in the lecture transcription task. Moreover, the proposed method has also shown a potential for DNN adaptation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call