Semi-supervised ensemble DNN acoustic model training

Sheng Li,Masato Mimura,Shinsuke Sakai,Tatsuya Kawahara,Xugang Lu

doi:10.1109/icassp.2017.7953162

Abstract

It is very important to exploit abundant unlabeled speech for improving the acoustic model training in automatic speech recognition (ASR). Semi-supervised training methods incorporate unlabeled data in addition to labeled data to enhance the model training, but it encounters the error-prone label problem. The ensemble training scheme trains a set of models and combines them to make the model more general and robust, but it has not been applied to the unlabeled data. In this work, we propose an effective semi-supervised training of deep neural network (DNN) acoustic models by incorporating the diversity among the ensemble of models. The resultant model improved the performance in the lecture transcription task. Moreover, the proposed method has also shown a potential for DNN adaptation.

Full Text