Improved modulation spectrum normalization techniques for robust speech recognition

Chi-An Pan Chi-An Pan,Chieh-Cheng Wang Chieh-Cheng Wang,Jeih-Weih Hung Jeih-Weih Hung

doi:10.1109/icassp.2008.4518553

Abstract

The modulation spectra of speech features are often distorted due to environmental interferences. In order to reduce this distortion, in this paper we propose several approaches to normalize the power spectral density (PSD) of the feature stream to a reference function. These approaches include least-squares temporal filtering (LSTF), least-squares spectrum fitting (LSSF) and magnitude spectrum interpolation (MSI). It is shown that all the proposed approaches can effectively improve the speech recognition accuracy in various noise corrupted environments. In experiments conducted on the Aurora-2 noisy digits database with a complex back-end, these new approaches provide an average relative error reduction rate of over 40% when compared with the baseline MFCC processing.

Full Text