Development and comparison of ASR models using kaldi for noisy and enhanced kannada speech data

G Thimmaraja Yadava,H S Jayanna

doi:10.1109/icacci.2017.8126111

Abstract

In this work, the Automatic Speech Recognition (ASR) models are developed using the speech recognition toolkit Kaldi to build an ASR system for Kannada language. A sufficient amount of speech data is collected from the farmers in the field across the different dialect regions of Karnataka state to capture all possible pronunciations. The collected speech data under uncontrolled environment is normally noisy in nature. A method is proposed for speech enhancement and it is a combination of Spectral Subtraction with Voice Activity Detection (SS-VAD) and Minimum Mean Square Error-Spectrum Power Estimator (MMSE-SPZC) based on Zero Crossing. The transcription and validation of noisy and enhanced speech data is done at word level by using Indic language transliteration tool (IT3 TO UTF-8). The Kannada dictionary and phoneme set is created by using Indian Language Speech Label (ILSL12) set. The 75% and 25% of validated speech data is used for system training and testing respectively. Using Kaldi recipe and Kannada language resources, the ASR models are developed, discussed and compared the Word Error Rates (WERs) of noisy and enhanced speech data. The best ASR models could be used in spoken query system to access the on time agricultural commodity price and weather information in Kannada language.

Full Text