Abstract

This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after filtering the autocorrelation. To achieve a more robust front-end, we also modify the robust distortionless constraint of the MVDR spectral estimation method via revised weighting of the subband power spectrum values based on the sub-band signal to noise ratios (SNRs), which adjusts it to the new proposed approach. This new function allows the components of the input signal at the frequencies least affected by noise to pass with larger weights and attenuates more effectively the noisy and undesired components. This modification results in reduction of the noise residuals of the estimated spectrum from the filtered autocorrelation sequence, thereby leading to a more robust algorithm. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions.

Highlights

  • Speech recognition systems are usually trained in clean conditions and tested in different environments

  • We modify the weighting function proposed in [11] to adjust it to the new proposed approach and improve the recognition accuracies in both high and low signal to noise ratios (SNRs). We have suggested this modification on the weighting function to adapt it to the proposed procedure on the perceptual spectrum of temporally filtered autocorrelation sequence, which has higher subband SNR compared to nonfiltered case

  • (b) Street noise spectrum of this filtered autocorrelation sequence to further reduce the noise residuals. This idea led to perceptual MVDR spectrum of relative autocorrelation sequence (PMSR) features with a better performance than RAS-Mel frequency cepstral coefficients (MFCC) in all clean and noisy cases

Read more

Summary

Introduction

Speech recognition systems are usually trained in clean conditions and tested in different environments (clean and noisy). Robust speech recognition is considered as one of the most challenging areas in speech processing technology since the type of the noise encountered in test conditions is usually not predictable. Robust speech recognition methods may be classified into four main categories [1]:. Finding a set of parameters which are robust against the variations made by different noises on speech signals is the main purpose of the first method. This category, itself, can be further classified into two main divisions:

Objectives
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call