Abstract

Data-driven temporal filtering approaches based on a specific optimization criterion have been shown to be capable of enhancing the discrimination and robustness of speech features in speech recognition. The filters in these approaches are often obtained with the statistics of the features in the temporal domain. In this paper, we derive new data-driven temporal filters that employ the statistics of the modulation spectra of the speech features. The new temporal filtering approach is based on the constrained version of linear discriminant analysis (C-LDA). It is shown that the proposed C-LDA temporal filters can effectively improve the speech recognition accuracy in various noise corrupted environments. In experiments conducted on Test Set A of the Aurora-2 noisy digits database, these new temporal filters, together with cepstral mean and variance normalization (CMVN), provides average relative error reduction rates of over 47% and 30%, when compared with the baseline MFCC processing and CMVN alone, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call