Abstract

In the case of low resource language, there is still the requirement for developing more efficient Automatic Speech Recognition (ASR) systems. In the proposed work, the ASR system is developed for the Gujarati language publicly available dataset. The approach in this paper applies the combination of Mel-frequency Cepstral Coefficients (MFCC) with Constant Q Cepstral Coefficients (CQCC)-based integrated front-end feature extraction techniques. To implement the backend part of the system, hybrid acoustic model is applied. Two-dimensional Convolutional Neural Network (Conv2D) with Bi-directional Gated Recurrent Units-based (BiGRU) backend model is used as the model. To build the ASR system, Connectionist Temporal Classification (CTC) loss function, CTC and prefix-based greedy decoder are also used with the acoustic model. The proposed work shows that the joint MFCC and CQCC feature extraction techniques show the 10–19% improvement in Word Error Rate (WER) as compared to isolated delta-delta features with the available integrated model.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.