Enhancing Performance of End-to-End Gujarati Language ASR using combination of Integrated Feature Extraction and Improved Spell Corrector Algorithm

M Dua,Mohit Dua,Bhavesh Bhagat,A Jain,T.N Sasamal,P Verma

doi:10.1051/itmconf/20235401016

Abstract

A number of intricate deep learning architectures for effective End-to-End (E2E) speech recognition systems have emerged due to recent advancements in algorithms and technical resources. The proposed work develops an ASR system for the publicly accessible dataset on Gujarati language. The approach provided in this research combines features like Mel frequency Cepstral Coefficients (MFCC) and Constant Q Cepstral Coefficients (CQCC) at front-end feature extraction methodologies. Enhanced spell corrector with BERT-based algorithm and Gated Recurrent Units (GRU) based DeepSpeech2 architecture are used to implement the back end portion of the proposed ASR system. The proposed study shown that combining the MFCC features and CQCC features extracted from speech with the GRU-based DeepSpeech2 model and the upgraded or enhanced spell corrector improves the Word Error Rate (WER) by 17.46% when compared to the model without post processing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: ITM Web of Conferences	Publication Date: Jan 1, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Enhancing Performance of End-to-End Gujarati Language ASR using combination of Integrated Feature Extraction and Improved Spell Corrector Algorithm

Abstract

Talk to us

Similar Papers

More From: ITM Web of Conferences

Lead the way for us

Similar Papers

Gujarati Language Automatic Speech Recognition Using Integrated Feature Extraction and Hybrid Acoustic Model
Mohit Dua ... Akanksha
-
Mohit Dua, et. al.Mohit Dua ... Akanksha
01 Jan 2023
01 Jan 2023

A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition
Umit H Yapanel ... John H.L Hansen
Speech Communication | VOL. 50
Umit H Yapanel, et. al.Umit H Yapanel ... John H.L Hansen
19 Sep 2007
Speech Communication | VOL. 50

Replay Spoof Attack Detection using Deep Neural Networks for Classification
Salahaldeen Duraibi ... Wasim Alhamdani
-
Salahaldeen Duraibi, et. al.Salahaldeen Duraibi ... Wasim Alhamdani
01 Dec 2020
01 Dec 2020

Constant Q Cepstral coefficients for classification of normal vs. Pathological infant cry
Hemant A Patil ... Ankur T Patil
-
Hemant A Patil, et. al.Hemant A Patil ... Ankur T Patil
23 May 2022
23 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing Performance of End-to-End Gujarati Language ASR using combination of Integrated Feature Extraction and Improved Spell Corrector Algorithm

Abstract

Talk to us

Similar Papers

More From: ITM Web of Conferences