Relative Word Error Reduction Research Articles

A blind dereverberation method based on power spectral subtraction (SS) using a multi-channel least mean squares algorithm was previously proposed to suppress the reverberant speech without additive noise. The results of isolated word speech recognition experiments showed that this method achieved significant improvements over conventional cepstral mean normalization (CMN) in a reverberant environment. In this paper, we propose a blind dereverberation method based on generalized spectral subtraction (GSS), which has been shown to be effective for noise reduction, instead of power SS. Furthermore, we extend the missing feature theory (MFT), which was initially proposed to enhance the robustness of additive noise, to dereverberation. A one-stage dereverberation and denoising method based on GSS is presented to simultaneously suppress both the additive noise and nonstationary multiplicative noise (reverberation). The proposed dereverberation method based on GSS with MFT is evaluated on a large vocabulary continuous speech recognition task. When the additive noise was absent, the dereverberation method based on GSS with MFT using only 2 microphones achieves a relative word error reduction rate of 11.4 and 32.6% compared to the dereverberation method based on power SS and the conventional CMN, respectively. For the reverberant and noisy speech, the dereverberation and denoising method based on GSS achieves a relative word error reduction rate of 12.8% compared to the conventional CMN with GSS-based additive noise reduction method. We also analyze the effective factors of the compensation parameter estimation for the dereverberation method based on SS, such as the number of channels (the number of microphones), the length of reverberation to be suppressed, and the length of the utterance used for parameter estimation. The experimental results showed that the SS-based method is robust in a variety of reverberant environments for both isolated and continuous speech recognition and under various parameter estimation conditions.

Read full abstract

Traditionally, noise reduction methods for additive noise have been quite different from those for reverberation. In this study, we investigated the effect of additive noise and reverberation on speech on the basis of the concept of temporal modulation transfer. We first analyzed the noise effect on the temporal modulation of speech. Then on the basis of this analysis, we proposed a two-stage processing algorithm that adaptively normalizes the temporal modulation of speech to extract robust speech features for automatic speech recognition. In the first stage of the proposed algorithm, the temporal modulation contrast of the cepstral time series for both clean and noisy speech is normalized. In the second stage, the contrast normalized temporal modulation spectrum is smoothed in order to reduce the artifacts due to noise while preserving the information in the speech modulation events (edges). We tested our algorithm in speech recognition experiments for additive noise condition, reverberant condition, and noisy condition (both additive noise and reverberation) using the AURORA-2J data corpus. Our results showed that as part of a uniform processing framework, the algorithm helped achieve the following: (1) for the additive noise condition, a 55.85% relative word error reduction (RWER) rate when clean conditional training was performed, and a 41.64% RWER rate when multi-conditional training was performed, (2) for the reverberant condition, a 51.28% RWER rate, and (3) for the noisy condition (both additive noise and reverberation), a 95.03% RWER rate. In addition, we evaluated the performance of each stage of the proposed algorithm in AURORA-2J and AURORA4 experiments, and compared the performance of our algorithm with the performances of two similar processing algorithms in the second stage. The evaluation results further confirmed the effectiveness of our proposed algorithm.

Read full abstract

Relative Word Error Reduction Research Articles

Articles published on Relative Word Error Reduction

Gaussian mixture models for adaptation of deep neural network acoustic models in automatic speech recognition systems

Use of Micro-Modulation Features in Large Vocabulary Continuous Speech Recognition Tasks

Speeding up deep neural network based speech recognition systems

The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition

Dereverberation and denoising based on generalized spectral subtraction by multi-channel LMS algorithm using a small-scale microphone array

Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition

Robust Speech Recognition by Using Compensated Acoustic Scores

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Relative Word Error Reduction Research Articles

Articles published on Relative Word Error Reduction

Gaussian mixture models for adaptation of deep neural network acoustic models in automatic speech recognition systems

Use of Micro-Modulation Features in Large Vocabulary Continuous Speech Recognition Tasks

Speeding up deep neural network based speech recognition systems

The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition

Dereverberation and denoising based on generalized spectral subtraction by multi-channel LMS algorithm using a small-scale microphone array

Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition

Robust Speech Recognition by Using Compensated Acoustic Scores