Abstract

One of the big challenges in the field of Automatic Speech Recognition (ASR) consists in developing suitable solutions able to work properly also in adverse acoustic conditions, like in presence of additive noise and/or in reverberant rooms. Recently a certain attention has been paid to deeply integrate the noise suppressor in the feature extraction pipeline. In this paper, different single-channel MMSE-based noise reduction schemes have been implemented both in the frequency and cepstral domains and the related recognition performances evaluated on the AURORA2 and AURORA4 databases, therefore providing a useful reference for the scientific community.

Highlights

  • Automatic Speech Recognition (ASR) is a challenging task largely addressed by the scientific community in the last two decades

  • The followed approach is similar to Ephraim and Malah algorithm (E&M) [4] but differs because the algorithm is applied to the power spectral magnitude of the filter bank’s output instead of the DFT spectral amplitude and because the noise variance takes into account the phase difference between the noise and the clean speech

  • Frequency domain results on AURORA2 show that LSA algorithm produces a remarkable improvement of recognition accuracy, and that the Global SNR (gSNR) modification gives a further increase of about 2% on average

Read more

Summary

Introduction

Automatic Speech Recognition (ASR) is a challenging task largely addressed by the scientific community in the last two decades. A notable interest raised during last years in the study and development of robust solutions in presence of acoustic nonidealities [1], for example, background noise, simultaneous speakers, and reverberation As result of these efforts, a profuse literature of environment-robust ASR techniques has been registered. The following classification can be proposed therein, as highlighted in [2]: featuredomain (FD) and model-based (MB) algorithms The latter class encompasses all methodologies aimed to adapt the acoustic model (HMM) parameters in order to maximize the system matching to the distorted environment. We can cite the log-spectral amplitude MMSE suppression rules due to their efficacy to reduce noise at a cost of low distortion level [4, 5] These rules have been implemented in the cepstral domain, so working closely to the backend [6, 7].

Background on Frequenc-Domain MMSE Algorithms
Gain Modification Based on Soft Decision
Background on Cepstral Domain MMSE Algorithms
Cepstral Domain Gain Modification Based on Soft Decision
Computer Simulations
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call