A perceptual masking approach for noise robust speech recognition

Hari Krishna Maganti,Marco Matassoni

doi:10.1186/1687-4722-2012-29

Hari Krishna Maganti, Marco Matassoni

Open Access

https://doi.org/10.1186/1687-4722-2012-29

Copy DOI

Abstract

This article describes a modified technique for enhancing noisy speech to improve automatic speech recognition (ASR) performance. The proposed approach improves the widely used spectral subtraction which inherently suffers from the associated musical noise effects. Through a psychoacoustic masking and critical band variance normalization technique, the artifacts produced by spectral subtraction are minimized for improving the ASR accuracy. The popular advanced ETSI-2 front end is tested for comparison purposes. The performed speech recognition evaluations on the noisy standard AURORA-2 tasks show enhanced performance for all noise conditions.

Highlights

Enhancement of noise corrupted speech signals is a challenging task for speech processing systems to be deployed in real-world applications
Apart from extracting robust features which represent parameters less sensitive to noise by modifying the extracted features [1], other research directions aimed at increasing the performance of speech recognizers in noise are: speech signal enhancement, model adaptation and hybrid methods [2,3,4,5]
The spectral subtraction method was used to reduce the broadband noise due to peaks, and the combination of masking and variance normalization technique was effective in reducing the artifacts by reducing the dynamic range of its magnitude spectrum, which resulted in the improved speech recognition performance

Summary

Introduction

Enhancement of noise corrupted speech signals is a challenging task for speech processing systems to be deployed in real-world applications. The application of using human auditory masking in Kalman filtering to speech enhancement is considered in [9] Another novel approach based on sub-band variance normalization technique was proposed where speech frames are characterized by high variance and noise frames by low variance, which are suppressed to improve the ASR performance in presence of both additive noise and reverberation [10]. The non-linear mapping of spectral estimates that fall below a threshold, where noise has been overestimated results in some randomly located negative values for the estimated clean speech magnitude This leads to undesired residual noise called musical noise (narrow band spectrum with randomly distributed tones over time and frequency). The global masking threshold is applied, variance normalization is performed to further suppress the tones at random frequencies Later, these normalized values are used as weights which are multiplied with the filter bank energies as shown

Original PSD

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Dec 1, 2012
Citations: 18	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

A perceptual masking approach for noise robust speech recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Combined speech enhancement and auditory modelling for robust distributed speech recognition
Ronan Flynn ... Edward Jones
Speech Communication | VOL. 50
Ronan Flynn, et. al.Ronan Flynn ... Edward Jones
20 May 2008
Speech Communication | VOL. 50

Autocorrelation-based Methods for Noise-Robust Speech Recognition
Gholamreza Farahani ... Mohammad Mehdi
-
Gholamreza Farahani, et. al.Gholamreza Farahani ... Mohammad Mehdi
01 Jun 2007
01 Jun 2007

Investigation of Automatic Speech Recognition Performance and Mean Opinion Scores for Different Standard Speech and Audio Codecs
A V Ramana ... Mythili Sharan Pala
IETE Journal of Research | VOL. 58
A V Ramana, et. al.A V Ramana ... Mythili Sharan Pala
01 Mar 2012
IETE Journal of Research | VOL. 58

Estimation of speech recognition performance in noisy and reverberant environments using PESQ score and acoustic parameters
Takahiro Fukumori ... Takanobu Nishiura
-
Takahiro Fukumori, et. al.Takahiro Fukumori ... Takanobu Nishiura
01 Oct 2013
01 Oct 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A perceptual masking approach for noise robust speech recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing