Abstract

Even though deep neural network acoustic models provide an increased degree of robustness in automatic speech recognition, there is still a large performance drop in the task of far-field speech recognition in reverberant and noisy environments. In this study, we explore DNN adaptation techniques to achieve improved robustness to environmental mismatch for far-field speech recognition. In contrast to many recent studies investigating the role of feature processing in DNN-HMM systems, we focus on adaptation of a clean-trained DNN model to speech data captured by a distant-talking microphone in a target environment with substantial reverberation and noise. We show that significant performance gains can be obtained by discriminatively estimating a set of adaptation parameters to compensate the mismatch between a clean-trained model and a small set of noisy and reverberant adaptation data. Using various adaptation strategies, relative word error rate improvements of up to 16% could be obtained on the single-channel task of the recent Aspire challenge.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call