Abstract

Abstract This paper discusses an automatic speech recognition (ASR) system in Hindi. The language models and acoustic models are built using the open source toolkit Kaldi. A significant portion of the corpus built for this work pertains to the medical domain, as our primary emphasis lies in the application of speech processing for medical transcription. The various acoustic models used for the comparison of word error rates (WER) in Kaldi include HMM-GMM (Hidden Markov Model-Gaussian Mixture Model) based Monophone, Triphone (tri1,tri2, tri3) and SGMM(Sub Space Gaussian Mixture Model). Comparing the WER for various acoustic models used, it was observed that tri3 model has the least WER over the other acoustic models. Also, the possible mappings of phonemes detected have been shown

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call