Kaldi recipe in Hindi for word level recognition and phoneme level transcription

Karra Venkata Lakshmi Sri,Radhika Rajeev Nair,Mayuka Srinivasan,Deepa Gupta,K Jeeva Priya

doi:10.1016/j.procs.2020.04.268

Karra Venkata Lakshmi Sri, Radhika Rajeev Nair + Show 3 more

Open Access

https://doi.org/10.1016/j.procs.2020.04.268

Copy DOI

Abstract

Abstract This paper discusses an automatic speech recognition (ASR) system in Hindi. The language models and acoustic models are built using the open source toolkit Kaldi. A significant portion of the corpus built for this work pertains to the medical domain, as our primary emphasis lies in the application of speech processing for medical transcription. The various acoustic models used for the comparison of word error rates (WER) in Kaldi include HMM-GMM (Hidden Markov Model-Gaussian Mixture Model) based Monophone, Triphone (tri1,tri2, tri3) and SGMM(Sub Space Gaussian Mixture Model). Comparing the WER for various acoustic models used, it was observed that tri3 model has the least WER over the other acoustic models. Also, the possible mappings of phonemes detected have been shown

Full Text