Abstract

This paper describes a speech recognition system for the South Indian language, Kannada using Kaldi toolkit. KALDI is a open source toolkit based on Finite State Transducers (FST's). Two speech data sets has been collected from 10 different speakers (5 male and 5 female). The first data set consists of a digit corpora in Kannada where each speaker has spoken a number ten times and the second data set consists of simple Kannada phrases. The noise to a large extent has been filtered manually and the data has been segmented using the software application Audacity(v2.2.2). The main objective is to compare the word error rate (WER) of the two data sets using different acoustic models in Gaussian Mixture Models(GMM) and Sub-Gaussian Mixture Model(SGMM). The WER for Gaussian Mixture Model and Subspace Gaussian Mixture Model for the first data set is 4.54% and 4.27% respectively and for the second data set the WER for GMM and SGMM is 12.27% and 13%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call