Abstract

In Automated speech recognition of the system performance is crucial and important to satisfy multiple requirements of HMI and, more recently, even in IoT-related applications as well. Concurrently, there has been an increase in demand for detecting strong critical features derived from speech utterances. This paper presents a performance of the developed machine learning algorithms with respect to audio digit speech recognition and classification. The prepared dataset contains a free range of words (from 1 to 10) from speakers of different age groups. The Audacity software used for preprocessing the audio files that includes removal of noise included in the signal and trimming the silence on either side of the word utterance. audio signal sampled at fs = 48kHz.We have developed four AI Models to recognise the word utterances. Audio signals are processed separately and derived two unique feature sets that includes statistical features set and singular values by performing SVD related to word utterances. The cepstral values for each utterance are obtained from state-of-the-art MFCC. Variance-covariance matrix is calculated from the generated MFCC matrix. The diagonal values which form the variance are recorded and denoted as feature set-1 for the word utterance and inputted to the machine learning algorithms. Performance matrices of the developed models are recorded. To keep the computational bottleneck associated with the use of feature sets to minimum, dimensionality reduction is carried out by applying singular value decomposition to the extracted MFCC matrix. The derived set of singular values ​​considered as feature set-2 is used to train and test the developed AI models with a ratio of 70:30. We presented and discussed the performance and results produced by MLP, KNN, SVM, Random Forest algorithms. In comparison, MLP and Random Forest were found to show excellent performance on both feature sets with 100% training accuracy and 99% test accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call