Abstract
The demonstration of the latest enhancements in the end-to-end (E2E) isolated Kannada automatic speech recognition (ASR) system, achieved by combining a robust background noise elimination technique and a time delay neural network (TDNN), is presented in this work. An E2E Kannada ASR system consists of an interactive voice response system (IVRS), ASR models, and databases containing weather and agricultural commodity prices information. In the earlier spoken query system (SQS), the presence of babble, street noise, and other background noises led to a decrease in both offline and online speech recognition accuracies. To properly train the models, we increase the size of the database by collecting the Kannada speech data from an additional 500 farmers under real-time conditions. Moreover, the proposed noise elimination technique is employed to enhance the degraded speech data. Additionally, the efficacy of the TDNN is explored to improve the recognition accuracy of ASR models and the SQS system.The outcomes of the proposed speech enhancement algorithm demonstrate the absence of audible musical noise and other types of background noises in enhanced NOIZEUS and isolated Kannada speech databases. Leveraging Kannada language resources and the amalgamation of the proposed noise reduction technique and TDNN, a significant 1.1% enhancement in speech recognition accuracy is achieved in comparison with the previously developed deep neural network-hidden Markov model (DNN-HMM) based SQS. The enhanced isolated E2E ASR SQS system undergoes testing by 500 farmers, enabling them to access real-time agricultural commodity prices and weather forecasting information in their native Kannada language/dialects. This practical validation highlights the applicability and effectiveness of our advancements in real-world scenarios. The source code of proposed noise elimination technique, experimental results of ASR models and demo conversation of SQS are made publicly available at: https://sites.google.com/view/thimmarajayadavag/downloads
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have