Abstract

Deep neural network(DNN) has achieved a great success in automatic speech recognition(ASR), and it can be regarded as a joint model combining the nonlinear feature transformation and the log-linear classifier. Recently DNN is adopted as a regression model to enhance the distorted feature in noisy condition and the enhanced feature is utilized to improve the performance of DNN based ASR. Previous work only predicts a single frame (log-spectrum) using the enhanced DNN and the final improvement of ASR is not big. In this paper, local trajectory, represented using multiple frames with dynamic features, is predicted instead to make the feature enhancement more stable. In addition, FBank features and long context window are used to better integrating the enhanced DNN into ASR DNN. Experiments on the Aurora4 corpus showed that, compared to the standard DNN baseline, the proposed approach can achieve 9.6% relative WER reduction and also significantly outperform the previously proposed DNN ASR system using the log-spectrum feature based enhancement.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call