Abstract
One of the big challenges in the field of Automatic Speech Recognition (ASR) consists in developing suitable solutions able to work properly also in adverse acoustic conditions, like in presence of additive noise and/or in reverberant rooms. Recently a certain attention has been paid to deeply integrate the noise suppressor in the feature extraction pipeline. In this paper, different single‐channel MMSE‐based noise reduction schemes have been implemented both in the frequency and cepstral domains and the related recognition performances evaluated on the AURORA2 and AURORA4 databases, therefore providing a useful reference for the scientific community.
Highlights
Automatic Speech Recognition (ASR) is a challenging task largely addressed by the scientific community in the last two decades
The followed approach is similar to Ephraim and Malah algorithm (E&M) [4] but differs because the algorithm is applied to the power spectral magnitude of the filter bank’s output instead of the DFT spectral amplitude and because the noise variance takes into account the phase difference between the noise and the clean speech
Frequency domain results on AURORA2 show that LSA algorithm produces a remarkable improvement of recognition accuracy, and that the Global SNR (gSNR) modification gives a further increase of about 2% on average
Summary
Automatic Speech Recognition (ASR) is a challenging task largely addressed by the scientific community in the last two decades. A notable interest raised during last years in the study and development of robust solutions in presence of acoustic nonidealities [1], for example, background noise, simultaneous speakers, and reverberation As result of these efforts, a profuse literature of environment-robust ASR techniques has been registered. The following classification can be proposed therein, as highlighted in [2]: featuredomain (FD) and model-based (MB) algorithms The latter class encompasses all methodologies aimed to adapt the acoustic model (HMM) parameters in order to maximize the system matching to the distorted environment. We can cite the log-spectral amplitude MMSE suppression rules due to their efficacy to reduce noise at a cost of low distortion level [4, 5] These rules have been implemented in the cepstral domain, so working closely to the backend [6, 7].
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have