Improvements in IITG Assamese Spoken Query System: Background Noise Suppression and Alternate Acoustic Modeling

S Shahnawazuddin,Siddika Imani,Deepak Thotappa,S R M Prasanna,Abhishek Dey,Rohit Sinha

doi:10.1007/s11265-016-1133-6

Abstract

In this work, we present the recent improvements incorporated in the earlier developed Assamese spoken query (SQ) system for accessing the price of agricultural commodities. The SQ system consists of interactive voice response (IVR) and automatic speech recognition (ASR) modules developed using open source resources. The speech data used for training the ASR system has a high level of background noise since it is collected in field conditions. In the earlier version of the SQ system, this background noise had an adverse effect on the recognition performance. In the improved version, a background noise suppression module based on zero frequency filtering is added before feature extraction. In addition to this, we have also explored the recently reported subspace Gaussian mixture (SGMM) and deep neural network (DNN) based acoustic modeling approaches. These techniques have been reported to be more powerful than the GMM-HMM approach which was employed in the previous version. Further, the foreground separated speech data is used while learning the acoustic models for all systems. The amalgamation of noise removal and SGMM/DNN-based acoustic modeling is found to result in a relative improvement of 39 % in word error rate in comparison to the earlier reported GMM-HMM-based ASR system. The on-line testing of the developed SQ system (done with the help of real farmers) is also presented in this work. Some efforts are made to quantify the usability of the developed SQ system and the explored enhancements are noted to be helpful on that front too.

Full Text