Abstract

The shared-hidden-layer multilingual deep neural network (SHL-MDNN), in which the hidden layers of feed-forward deep neural network (DNN) are shared across multiple languages while the softmax layers are language dependent, has been shown to be effective on acoustic modeling of multilingual low-resource speech recognition. In this paper, we propose that the shared-hidden-layer with Long Short-Term Memory (LSTM) recurrent neural networks can achieve further performance improvement considering LSTM has outperformed DNN as the acoustic model of automatic speech recognition (ASR). Moreover, we reveal that shared-hidden-layer multilingual LSTM (SHL-MLSTM) with residual learning can yield additional moderate but consistent gain from multilingual tasks given the fact that residual learning can allievate the degradation problem of deep LSTMs. Experimental results demonstrate that SHL-MLSTM can relatively reduce word error rate (WER) by 2.1-6.8\% over SHL-MDNN trained using six languages and 2.6-7.3\% over monolingual LSTM trained using the language specific data on CALLHOME datasets. Additional WER reduction, about relatively 2\% over SHL-MLSTM, can be obtained through residual learning on CALLHOME datasets, which demonstrates residual learning is useful for SHL-MLSTM on multilingual low-resource ASR.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call