Abstract

The performance of an automatic speech recognition (ASR) system will degrade dramatically in noisy environments because of the mismatch between testing and training. This paper presents an efficient robust method, which combines the minimum mean square error (MMSE) speech enhancement with cepstral mean normalization (CMN). In the front-end stage, the MMSE enhancement is adopted to suppress the intrusive noise to a lower level, but this process is usually at the expense of spectral variation of clean speech, which also severely affects the recognition. Thus, CMN is then used to compensate the distortion. including the spectral variation and the residual noise. Experimental evaluations show that the proposed robust method can significantly improve the recognition accuracy across a wide range of signal-to-noise ratios (SNR), especially in very noisy environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call