Noisy Speech Recognition Research Articles

We propose a multi-target, signal-to-noise-ratio (SNR)-progressive learning (SNR-PL) framework for regression based speech enhancement (SE). At low SNR levels, it is often not easy to directly learn the complicated regression required in SE. We therefore decompose the original SE problem of mapping noisy to clean speech features, with a large SNR gap, into a series of sub-problems, each with a small SNR increment and presumably easier to learn. In our configurations, each hidden layer of the proposed regression neural network is guided to explicitly learn an intermediate target with a specified but small SNR gain. Tested on both deep neural network (DNN) and long short-term memory (LSTM) architectures, SNR-PL consistently outperforms the conventional “black box” DNN framework in terms of both objective measure superiority and network model compactness. Furthermore, with the best configured LSTM-based SNR-PL model, we often observe that the performance is easily saturated or even degraded when increasing the number of intermediate targets, due to the fact that useful information is lost in dimension reduction when involving more target layers. Accordingly, to address this information loss issue, we explore densely connected networks on top of the LSTM structure where the input and the preceding intermediate targets are concatenated together to learn the next target. Finally, to fully utilize the rich and complementary information of intermediate targets, a simple post-processing strategy is adopted to further improve the performance. Evaluated on the simulation speech data, experimental results in unseen noises cases demonstrate that the proposed approach consistently performs better than the conventional LSTM approach in terms of objective speech enhancement measures for speech intelligibility and quality. Furthermore, when evaluated on real data provided by the CHiME-4 Challenge for automatic speech recognition (ASR) of noisy microphone array speech, we show that the proposed approach with intermediate outputs can directly improve the ASR performance, while the conventional LSTM approach increases the word error rate.

In this paper Factored front-end CMLLR (F-FE-CMLLR) is investigated for the task of joint speaker and environment normalization in the frame-work of DNN-HMM acoustic modeling. It is a feature-space transform comprising of the composition of front-end CMLLR for environment and global CMLLR for speaker normalizations. The transform is applied to the input noisy, speaker-independent features and the resulting canonical features are passed on to the DNN-HMM for training and decoding. Two estimation procedures for F-FE-CMLLR are investigated, namely, sequential and iterative training. One of the key attributes of F-FE-CMLLR is that in the iterative training paradigm it is likely to foster acoustic factorization, which enables more effective transfer of the environment transform from one condition to another. Moreover, being a feature space transform, it becomes straightforward to use it in the context of DNN-HMM acoustic modeling. The performance of the proposed scheme is evaluated on the Aurora-4 noisy speech recognition task. The dominant acoustic factors in the task are the microphone variability, additive noise with varying SNRs and speakers. It is shown that F-FE-CMLLR yields a large improvement in performance compared to the baseline features, which are processed with CMLLR for speaker adaptive training (SAT). The improvement is observed in all acoustic conditions existing in the test sets. Moreover, the iterative training of F-FE-CMLLR outperforms sequential training under all test conditions. Specifically, when all three type of acoustic conditions co-exist, the sequential training yields a 13% relative improvement over SAT features. The iterative training provides an additional improvement on the top, amounting to an 18% relative gain over-all. It is argued that the improvement over sequential training is observed due to acoustic factorization that holds in an implicit sense.

Noisy Speech Recognition Research Articles

Related Topics

Articles published on Noisy Speech Recognition

Task-Oriented Document-Grounded Dialog Systems by HLTPR@RWTH for DSTC9 and DSTC10

End-to-End Noisy Speech Recognition Using Fourier and Hilbert Spectrum Features

A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement

Feature Level Solution to Noise Robust Speech Recognition in the context of Tonal Languages

Japanese speech intelligibility estimation and prediction using objective intelligibility indices under noisy and reverberant conditions

Wiener Filter in Wavelet Domain for Mel-LPC based Noisy Speech Recognition

Noisy Speech Recognition by Mel-LPC based AR-HMM with Power and Time Derivative Parameters

Factored front-end CMLLR for joint speaker and environment normalization under DNN-HMM

Phoneme class based feature adaptation for mismatch acoustic modeling and recognition of distant noisy speech

Autocorrelation-based noise subtraction method with smoothing, overestimation, energy, and cepstral mean and variance normalization for noisy speech recognition

Bayesian feature enhancement using independent vector analysis and reverberation parameter re-estimation for noisy reverberant speech recognition

Prior-based Binary Masking and Discriminative Methods for Reverberant and Noisy Speech Recognition Using Distant Stereo Microphones

Auditory driven subband speech enhancement for automatic recognition of noisy speech

Improvement of the noisy speech recognition using method of thresholds and emphasizing wavelet coefficients for clean speech

Nanophotonic reservoir computing for noisy speech recognition

Feature Extraction Method for Improving Speech Recognition in Noisy Environments

Performance estimation of noisy speech recognition using spectral distortion and recognition task complexity

Speech Recognition using ERB-like Admissible Wavelet Packet Decomposition based on Perceptual sub-band Weighting

Noise Robust Speech Parameterization using Relative Spectra and Auditory Filterbank

Laplace Group Sensing for Acoustic Models

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Noisy Speech Recognition Research Articles

Related Topics

Articles published on Noisy Speech Recognition

Task-Oriented Document-Grounded Dialog Systems by HLTPR@RWTH for DSTC9 and DSTC10

End-to-End Noisy Speech Recognition Using Fourier and Hilbert Spectrum Features

A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement

Feature Level Solution to Noise Robust Speech Recognition in the context of Tonal Languages

Japanese speech intelligibility estimation and prediction using objective intelligibility indices under noisy and reverberant conditions

Wiener Filter in Wavelet Domain for Mel-LPC based Noisy Speech Recognition

Noisy Speech Recognition by Mel-LPC based AR-HMM with Power and Time Derivative Parameters

Factored front-end CMLLR for joint speaker and environment normalization under DNN-HMM

Phoneme class based feature adaptation for mismatch acoustic modeling and recognition of distant noisy speech

Autocorrelation-based noise subtraction method with smoothing, overestimation, energy, and cepstral mean and variance normalization for noisy speech recognition

Bayesian feature enhancement using independent vector analysis and reverberation parameter re-estimation for noisy reverberant speech recognition

Prior-based Binary Masking and Discriminative Methods for Reverberant and Noisy Speech Recognition Using Distant Stereo Microphones

Auditory driven subband speech enhancement for automatic recognition of noisy speech

Improvement of the noisy speech recognition using method of thresholds and emphasizing wavelet coefficients for clean speech

Nanophotonic reservoir computing for noisy speech recognition

Feature Extraction Method for Improving Speech Recognition in Noisy Environments

Performance estimation of noisy speech recognition using spectral distortion and recognition task complexity

Speech Recognition using ERB-like Admissible Wavelet Packet Decomposition based on Perceptual sub-band Weighting

Noise Robust Speech Parameterization using Relative Spectra and Auditory Filterbank

Laplace Group Sensing for Acoustic Models