An Investigation of Multilingual TDNN-BLSTM Acoustic Modeling for Hindi Speech Recognition

Ankit Kumar,Rajesh Kumar Aggarwal

doi:10.2174/2210327911666210118143758

Abstract

Background: In India, thousands of languages or dialects are in use. Most Indian dialects are low asset dialects. A well-performing Automatic Speech Recognition (ASR) system for Indian languages is unavailable due to a lack of resources. Hindi is one of them as large vocabulary Hindi speech datasets are not freely available. We have only a few hours of transcribed Hindi speech dataset. There is a lot of time and money involved in creating a well-transcribed speech dataset. Thus, developing a real-time ASR system with a few hours of the training dataset is the most challenging task. The different techniques like data augmentation, semi-supervised training, multilingual architecture, and transfer learning, have been reported in the past to tackle the fewer speech data issues. In this paper, we examine the effect of multilingual acoustic modeling in ASR systems for the Hindi language. Objective: This article’s objective is to develop a high accuracy Hindi ASR system with a reasonable computational load and high accuracy using a few hours of training data. Method: To achieve this goal we used Multilingual training with Time Delay Neural Network- Bidirectional Long Short Term Memory (TDNN-BLSTM) acoustic modeling. Multilingual acoustic modeling has significantly improved the ASR system's performance for low and limited resource languages. The common practice is to train the acoustic model by merging data from similar languages. In this work, we use three Indian languages, namely Hindi, Marathi, and Bengali. Hindi with 2.5 hours of training data and Marathi with 5.5 hours of training data and Bengali with 28.5 hours of transcribed data, was used in this work to train the proposed model. Results: The Kaldi toolkit was used to perform all the experiments. The paper is investigated over three main points. First, we present the monolingual ASR system using various Neural Network (NN) based acoustic models. Second, we show that Recurrent Neural Network (RNN) language modeling helps to improve the ASR performance further. Finally, we show that a multilingual ASR system significantly reduces the Word Error Rate (WER) (absolute 2% WER reduction for Hindi and 3% for the Marathi language). In all the three languages, the proposed TDNN-BLSTM-A multilingual acoustic models help to get the lowest WER. Conclusion: The multilingual hybrid TDNN-BLSTM-A architecture shows a 13.67% relative improvement over the monolingual Hindi ASR system. The best WER of 8.65% was recorded for Hindi ASR. For Marathi and Bengali, the proposed TDNN-BLSTM-A acoustic model reports the best WER of 30.40% and 10.85%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Investigation of Multilingual TDNN-BLSTM Acoustic Modeling for Hindi Speech Recognition

Abstract

Talk to us

Similar Papers

More From: International Journal of Sensors, Wireless Communications and Control

Lead the way for us

Journal: International Journal of Sensors, Wireless Communications and Control	Publication Date: Jan 1, 2022
Citations: 1

Similar Papers

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Shrikanth S Narayanan
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Shrikanth S Narayanan
01 Mar 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Retrospective Analysis of Clinical Performance of an Estonian Speech Recognition System for Radiology: Effects of Different Acoustic and Language Models
A. Paats ... I. Fridolin
Journal of Digital Imaging | VOL. 31
A. Paats, et. al.A. Paats ... I. Fridolin
30 Apr 2018
Journal of Digital Imaging | VOL. 31

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling
G Thimmaraja Yadava ... H S Jayanna
International Journal of Speech Technology | VOL. 23
G Thimmaraja Yadava, et. al.G Thimmaraja Yadava ... H S Jayanna
22 Jan 2020
International Journal of Speech Technology | VOL. 23

Using Auxiliary Sources of Knowledge for Automatic Speech Recognition

-

01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Investigation of Multilingual TDNN-BLSTM Acoustic Modeling for Hindi Speech Recognition

Abstract

Talk to us

Similar Papers

More From: International Journal of Sensors, Wireless Communications and Control