Abstract

In recent years, the speech recognition has made significant progress because deep networks are utilized. But because of the noise and reverberation, the performance of far-field speech recognition is still unsatisfactory. Though weighted prediction error (WPE) with deep neural network (DNN) can reduce the noise and attenuate reverberation, it still has some shortcomings. In our work, we use the recurrent neural network with long short term memory (LSTM) to predict the coefficients in WPE, which makes up the disadvantages and gets a better performance in the speech recognition. The experiment results on CHiME-5 dataset show that the best model with the proposed method gains 2.1% absolute word error rate (WER) reduction compared to the baseline system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call