Abstract

State-of-the-art automatic speech recognition (ASR) systems follow a well established statistical paradigm, that of parameterization of speech signals (a.k.a. feature extraction) at front-end and likelihood evaluation of feature vectors at back-end. For feature extraction, Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) are the two dominant signal processing methods, which have been used mainly in ASR. Although the effects of both techniques have been analyzed individually, it is not known whether any combination of the two can produce an improvement in the recognition accuracy or not. This paper presents an investigation on the possibility to integrate different types of features such as MFCC, PLP and gravity centroids to improve the performance of ASR in the context of Hindi language. Our experimental results show a significant improvement in case of such few combinations when applied to medium size lexicons in typical field conditions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.