Abstract

This paper investigates the distribution of Mel-filtered log-spectrum (MFLS) of speech signals in noisy environments. Without employing any prior assumption and using a non-parametric method, we estimate the Joint Probability Density Functions (JPDFs) of MFLS components of clean and noisy speech, and observe that noise components exhibit a normal distribution. Furthermore, clean and noisy speech exhibit a mixture of two normal distributions. The first lobe of this mixture fits to noise and corresponds to the intervals where speech is inactive; while the second one is formed by speech components. This lobe is almost equal (highly correlated) to the PDF of clean speech. Therefore, each MFLS component is accurately modeled by a normal PDF with time-varying parameters in a two-state Markov process. This shows where noise basically affects the distribution of clean speech, contrary to previous works. As an application example, we also present a novel noise-robust improvement to feature extraction for speech recognition by separating the non-speech and speech intervals for each MFLS component. The enhanced features are extracted from the minimum-mean-square-error (MMSE) estimates of MFLS coefficients. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, not only outperforms both the MFCC (Mel Frequency Cepstral Coefficient), as the baseline, and MFCC+CMVN (cepstral mean and variance normalization features), but also helps improve recognition performance when used in conjunction with ETSI-AFE, in different noisy conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call