Abstract

This paper focuses on the inclusion of more detailed linguistically relevant speech information in the Mel-Frequency Cepstral Coefficients(MFCC) feature extraction process in order to improve the recognition accuracy of LVCSR. Detailed linguistically relevant speech information feature is extracted to reflect the change of energy spectrum in each mel-frequency bank(MFB). A normalized positive weighting vector is used to combine the log channel energy feature of the standard MFCC with the new detailed information features to form one energy feature for each MFB. The optimal weighting vector can be obtained by the Heteroscedastic Discriminant Analysis (HDA) before feature extraction. Experiments on two test sets show that the new feature extraction method is superior in performance to the standard MFCC and 10% relative error reduction for LVCSR is witnessed in the test set with standard accent speakers.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call