Memory-Enhanced Neural Networks and NMF for Robust ASR

Jurgen T Geiger,Felix Weninger,Jort F Gemmeke,Gerhard Rigoll,Bjorn Schuller,Martin Wollmer

doi:10.1109/taslp.2014.2318514

Abstract

In this article we address the problem of distant speech recognition for reverberant noisy environments. Speech enhancement methods, e. g., using non-negative matrix factorization (NMF), are succesful in improving the robustness of ASR systems. Furthermore, discriminative training and feature transformations are employed to increase the robustness of traditional systems using Gaussian mixture models (GMM). On the other hand, acoustic models based on deep neural networks (DNN) were recently shown to outperform GMMs. In this work, we combine a state-of-the art GMM system with a deep Long Short-Term Memory (LSTM) recurrent neural network in a double-stream architecture. Such networks use memory cells in the hidden units, enabling them to learn long-range temporal context, and thus increasing the robustness against noise and reverberation. The network is trained to predict frame-wise phoneme estimates, which are converted into observation likelihoods to be used as an acoustic model. It is of particular interest whether the LSTM system is capable of improving a robust state-of-the-art GMM system, which is confirmed in the experimental results. In addition, we investigate the efficiency of NMF for speech enhancement on the front-end side. Experiments are conducted on the medium-vocabulary task of the 2nd ‘CHiME’ Speech Separation and Recognition Challenge, which includes reverberation and highly variable noise. Experimental results show that the average word error rate of the challenge baseline is reduced by 64% relative. The best challenge entry, a noise-robust state-of-the-art recognition system, is outperformed by 25% relative.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jun 1, 2014
Citations: 72	License type: other-oa

R Discovery Prime

R Discovery Prime

Memory-Enhanced Neural Networks and NMF for Robust ASR

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network
Alex Sherstinsky
Physica D: Nonlinear Phenomena | VOL. 404
Alex SherstinskyAlex Sherstinsky
21 Jan 2020
Physica D: Nonlinear Phenomena | VOL. 404

Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks
Zhuo Chen ... John R Hershey
-
Zhuo Chen, et. al.Zhuo Chen ... John R Hershey
06 Sep 2015
06 Sep 2015

Kazakh and Russian Languages Identification Using Long Short-Term Memory Recurrent Neural Networks
Zhanibek Kozhirbayev ... Muslima Karabalayeva
-
Zhanibek Kozhirbayev, et. al.Zhanibek Kozhirbayev ... Muslima Karabalayeva
01 Sep 2017
01 Sep 2017

Performance analysis of neural network, NMF and statistical approaches for speech enhancement
Ravi Kumar Kandagatla ... Venkata Subbaiah Potluri
International Journal of Speech Technology | VOL. 23
Ravi Kumar Kandagatla, et. al.Ravi Kumar Kandagatla ... Venkata Subbaiah Potluri
17 Sep 2020
International Journal of Speech Technology | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Memory-Enhanced Neural Networks and NMF for Robust ASR

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing