Abstract
Objectives: The primary goal is to address attempts to establish a Continuous Speech Recognition (CSR) framework for recognising continuous speech in Kannada. It is a difficult challenge to deal with a local language such as Kannada, which lacks the resources of a single language database. Methods: Modelling techniques such as monophone, triphone, deep neural network (DNN)-hidden Markov model (HMM) and Gaussian Mixture Model (GMM)- HMM-based models were implemented in Kaldi toolkit and used for continuous Kannada speech recognition (CKSR). To extract feature vectors from speech data, the Mel frequency Cepstral (MFCC) coefficient technique is used. The continuous Kannada speech database consists of 2800 speakers (1680 males and 1120 females) belong to the age group 8 years to 80 years. The training and testing data are in the ratio 80:20. In this paper the hybrid modelling techniques are implemented to recognize the spoken words. Findings: The model efficiency is determined based on the word error rate (WER) and the obtained results are assessed with the well-known datasets such as TIMIT and Aurora-4. This study found that using Kaldi-based features ex- traction recipes for monophone, triphone, DNN-HMM and GMM-HMM acoustic models had a word error rate (WER) of 8.23%, 5.23%, 4.05% and 4.64% respectively. The experimental results suggest that the rate of recognition of Kannada speech data has increased higher than that of state-of-the-art databases. Novelty : We propose a novel automatic speech recognition system for Kannada language. The main reason for developing the automatic speech recognition system for Kannada language is that there are only limited sources of standard continuous Kannada speech are available. We created large vocabulary Kannada database. We implemented monophone, triphone, Subspace Gaussian mixture model (SGMM) and hybrid modelling techniques to develop the automatic speech recognition system for Kannada language. Keywords: DNN; Continuous speech; HMM; Kannada dialect; Kaldi toolkit; monophone; triphone; WER
Highlights
The effective research into Kannada SR is more essentially needed
This work sets up a Continuous Speech Recognition (CSR) network for the Kannada language using phoneme modelling, where each phoneme is represented by a 5-state hidden Markov model (HMM) and each state is represented by a Gaussian Mixture Model (GMM)
The findings reveal that the SR systems produce a phone error rate (PER) of 24.21% and a word error rate (WER) of 4.12% respectively
Summary
The effective research into Kannada SR is more essentially needed. This work sets up a CSR network for the Kannada language using phoneme modelling, where each phoneme is represented by a 5-state HMM and each state is represented by a GMM. It can be very useful to digitize old palm- leaf manuscript documents by someone reading it Such efforts will help to contribute the research for the development of the SR system for the Kannada language. In[9], the authors presented their work on the building of an LVCSR system for Tamil dialect using DNN They used 8 long stretches of Tamil speech collected from 30 speakers with a lexicon size of 13,984 words, of which 5 hours of learning was used for training. The extensive literature survey concludes that work on CKSR is not remarkable This made us conduct some tests by developing our database of 2800 speakers gathered throughout the state of Karnataka in the real-world conditions, we would like to check the behaviour of state-of-the-art techniques for continuous Kannada speech. According to the speech data the phoneme level lexicon is built
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.