Joint sequence training of phone and grapheme acoustic model based on multi-task learning deep neural networks

Dongpeng Chen,Brian Mak,Sunil Sivadas

doi:10.21437/interspeech.2014-279

Abstract

Multi-task learning (MTL) can be an effective way to improve the generalization performance of singly learning tasks if the tasks are related, especially when the amount of training data is small. Our previous work applied MTL to the joint training of triphone and trigrapheme acoustic models using deep neural networks (DNNs) for low-resource speech recognition. Significant recognition improvement over the performance of their DNNs trained by single-task learning (STL) was obtained. In that work, both STL-DNNs and MTL-DNNs were trained by minimizing the total frame-wise cross entropies. Since phoneme and grapheme recognition are inherently sequence classification tasks, here we study the effect of sequencediscriminative training on their joint estimation using MTLDNNs. Experimental evaluation on TIMIT phoneme recognition shows that joint sequence training outperforms frame-wise training of phone and grapheme MTL-DNNs significantly.

Full Text