Gujarati Language Automatic Speech Recognition Using Integrated Feature Extraction and Hybrid Acoustic Model

Mohit Dua,Akanksha Akanksha

doi:10.1007/978-981-19-7753-4_4

Abstract

In the case of low resource language, there is still the requirement for developing more efficient Automatic Speech Recognition (ASR) systems. In the proposed work, the ASR system is developed for the Gujarati language publicly available dataset. The approach in this paper applies the combination of Mel-frequency Cepstral Coefficients (MFCC) with Constant Q Cepstral Coefficients (CQCC)-based integrated front-end feature extraction techniques. To implement the backend part of the system, hybrid acoustic model is applied. Two-dimensional Convolutional Neural Network (Conv2D) with Bi-directional Gated Recurrent Units-based (BiGRU) backend model is used as the model. To build the ASR system, Connectionist Temporal Classification (CTC) loss function, CTC and prefix-based greedy decoder are also used with the acoustic model. The proposed work shows that the joint MFCC and CQCC feature extraction techniques show the 10–19% improvement in Word Error Rate (WER) as compared to isolated delta-delta features with the available integrated model.

Full Text