Abstract
This paper proposes the implementation of an Automatic Speech Recognition (ASR) process through extraction of Mel-Frequency Cepstral Coefficients (MFCCs) from voice signal commands, application of the Discrete Cosine Transform (DCT) in these coefficients, Support Vector Machine (SVM) training optimized by the Particle Swarm Optimization (PSO) technique in order to speed up the whole process and using One Against All (OAA) multiclass SVM classification. The main contribution is in training phase that it is the combination of SVM with PSO algorithm, resulting in computational load and processing time reduction. This novel algorithm is called here as PSO-SVM hybrid training application and its performance is shown as the experimental results of voice signal commands in Brazilian Portuguese language. Such commands comprise 10 isolated digits (from zero to nine) and 20 action commands such as “go ahead”, “finish”, “pause”, etc.; that is, there are 30 different pattern types (classes) to be separated (recognized). The process is speaker independent type, that is, the voice bank used in training is different from the one used in tests. The obtained results presented success rates of 92% to 99% during the tests for the classifier using RBF kernel function. Besides, the comparison section shows that this technique is 25 times faster than the recognition without optimization and also, it presents 10% of improvement in recognition success rate when compared to the well-known technique, Gaussian Mixture Models (GMM) algorithm. In addition, the proposed algorithm can be applied in any data processing board for voice signals (DSP, FPGA, DSPIC, ...).
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have