Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model

Mohammed Kamal Khwaja,Simon Lui,P Arulmozhivarman,Peddakota Vikash

doi:10.1007/s10772-016-9377-x

Abstract

Phoneme recognition is an important aspect of speech processing and recognition. Research on phoneme recognition is several years old and numerous algorithms have been developed over the years to improve its accuracy. In this paper, a quantitative analysis of phoneme recognition using supervised learning is investigated. Most approaches to phoneme recognition rely on using mel frequency cepstrum based features for identification of the phoneme class. In our approach, we take into consideration the vocal tract area function along with mel frequency cepstrum coefficients and analyze the change in accuracy obtained by its introduction in the feature set. Support Vector Machines have been an attractive approach to pattern recognition and its usage as a supervised learning model has been popular in the speech processing community. We compare Support Vector Machines to other supervised learning models like the Naive Bayes, the k-Nearest Neighbors and the linear discriminant analysis classifiers, for our feature set. We impose a soft voting rule between the three best classifiers to produce our variation of a voting classifier. We enhance the accuracy of our classifier by using a priority based approach to estimate the three most likely phonemes, after the predicted phoneme. Through a figurative and quantitative approach, we show that our modified algorithm outperforms other traditional methods. Experiments were conducted on the WSJCAM0 corpus, a British English corpus.

Full Text