Abstract

The recent advancements in conversational Artificial Intelligence (AI) are fastly getting integrated with every realm of human lives. Conversational agents who can learn, understand human languages and mimic the human thinking process have already created a revolution in human lifestyle. Understanding the intention of a speaker from his natural speech is a significant step in conversational AI. A major challenge that hinders the efficacy of this process is the lack of language resources. In this research, we address this issue and develop a domain-specific speech command classification system for the Sinhala language, one of the low-resourced languages. An effective speech command classification system can be utilized in several value added applications such as speech dialog systems. Our speech command classification system is developed by integrating Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU). The ASR engine is implemented using Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) and it converts a Sinhala speech command into a corresponding text representation. The text classifier, which is implemented as an ensemble unit of several classifiers, predicts the intent of the speaker when provided with the above text output. In this paper, we discuss and evaluate various algorithms and techniques that can be utilized to optimize the performance of both the ASR and text classifier. As well, we present our novel Sinhala speech data corpus of 4.15[Formula: see text]h which is based on the banking domain. As the final outcome, our system reports its Sinhala speech command classification accuracy as 91.03%. It shows that our system outperforms the state-of-the-art speech-to-intent mapping systems developed for the Sinhala language. The individual evaluation on the ASR system reports a 9.91% Word Error Rate and a 19.95% Sentence Error Rate, suggesting the applicability of advanced speech recognition techniques despite the limited language resources. Finally, our findings deliver useful insights on further research in speech command classification in the low-resourced context.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call