Noisy training for deep neural networks in speech recognition

Shi Yin,Chao Liu,Javier Tejedor,Yinguo Li,Yiye Lin,Thomas Fang Zheng,Zhiyong Zhang,Dong Wang

doi:10.1186/s13636-014-0047-0

Abstract

Abstract Deep neural networks (DNNs) have gained remarkable success in speech recognition, partially attributed to the flexibility of DNN models in learning complex patterns of speech signals. This flexibility, however, may lead to serious over-fitting and hence miserable performance degradation in adverse acoustic conditions such as those with high ambient noises. We propose a noisy training approach to tackle this problem: by injecting moderate noises into the training data intentionally and randomly, more generalizable DNN models can be learned. This ‘noise injection’ technique, although known to the neural computation community already, has not been studied with DNNs which involve a highly complex objective function. The experiments presented in this paper confirm that the noisy training approach works well for the DNN model and can provide substantial performance improvement for DNN-based speech recognition.

Highlights

A modern automatic speech recognition (ASR) system involves three components: an acoustic feature extractor to derive representative features for speech signals, an emission model to represent static properties of the speech features, and a transitional model to depict dynamic properties of speech production
The dominant acoustic features in ASR are based on shorttime spectral analysis, e.g., Mel frequency cepstral coefficients (MFCC)
2 Related work The noisy training approach proposed in this paper was highly motivated by the noise injection theory which has been known for decades in the neural computing community [31,32,33,34]. This paper employs this theory and contributes in two aspects: first, we examine the behavior of noise injection in Deep neural networks (DNNs) training, a more challenging task involving a huge amount of parameters; second, we study mixture of multiple noises at various levels of signal-to-noise ratios (SNR), which is beyond the conventional noise injection theory that assumes small and Gaussian-like injected noises

Summary

Introduction

A modern automatic speech recognition (ASR) system involves three components: an acoustic feature extractor to derive representative features for speech signals, an emission model to represent static properties of the speech features, and a transitional model to depict dynamic properties of speech production. The idea is simple: by injecting some noises to the input speech data when conducting DNN training, the noise patterns are expected to be learned, and the generalization capability of the resulting network is expected to be improved. Both may improve robustness of DNNbased ASR systems within noisy conditions. If the training is based on clean speech only, the flexibility provided by the DNN structure is largely wasted This is because the phone class boundaries are relatively clear with clean speech, and so the abundant parameters of DNNs tend to learn the nuanced variations of phone implementations, conditioned on a particular type of channel and/or background noise. The noise-corrupted speech is fed into the DNN input units to conduct model training

Experiments

Findings

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Eurasip Journal on Audio, Speech, and Music Processing	Publication Date: Jan 20, 2015
Citations: 127	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Noisy training for deep neural networks in speech recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Eurasip Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Noisy training for deep neural networks
Xiangtao Meng ... Chao Liu
-
Xiangtao Meng, et. al.Xiangtao Meng ... Chao Liu
01 Jul 2014
01 Jul 2014

Investigation of stochastic Hessian-Free optimization in Deep neural networks for speech recognition
Zhao You ... Bo Xu
-
Zhao You, et. al.Zhao You ... Bo Xu
01 Sep 2014
01 Sep 2014

Deep recurrent regularization neural network for speech recognition
Jen-Tzung Chien ... Tsai-Wei Lu
-
Jen-Tzung Chien, et. al.Jen-Tzung Chien ... Tsai-Wei Lu
01 Apr 2015
01 Apr 2015

Optimization of Deep Neural Network for Automatic Speech Recognition
Aqbal Waris ... R.K Aggarwal
-
Aqbal Waris, et. al.Aqbal Waris ... R.K Aggarwal
01 Jul 2018
01 Jul 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Noisy training for deep neural networks in speech recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Eurasip Journal on Audio, Speech, and Music Processing