Abstract

The modulation spectra of speech features are often distorted due to environmental interferences. In order to reduce this distortion, in this paper we propose several approaches to normalize the power spectral density (PSD) of the feature stream to a reference function. These approaches include least-squares temporal filtering (LSTF), least-squares spectrum fitting (LSSF) and magnitude spectrum interpolation (MSI). It is shown that all the proposed approaches can effectively improve the speech recognition accuracy in various noise corrupted environments. In experiments conducted on the Aurora-2 noisy digits database with a complex back-end, these new approaches provide an average relative error reduction rate of over 40% when compared with the baseline MFCC processing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call