Speech recognition from adaptive windowing PSD estimation

Maryam Ravan,Soosan Beheshti

doi:10.1109/ccece.2011.6030506

Abstract

Speech-recognition technology is embedded in voice-activated routing systems at customer call centers, voice dialing on mobile phones, and many other everyday applications. Consequently, designing a robust speech-recognition system that adapts to acoustic conditions, such as the speaker's speech rate and accent is of utmost interest. In this paper we present a machine learning approach for speech recognition using the k Nearest Neighbor (k-NN) classifier. A small size vocabulary containing the two words “yes” and “no” is chosen that can be used for personal emergency response systems. In this method first the power spectrum density (PSD) of each frame of speech signal is estimated by using the recently developed adaptive windowing PSD estimation technique. The most relevant features corresponding to the PSD of the frame sequence are then identified using a feature selection scheme. These features are then fed into the k-NN classifier for speech recognition. The performance of the proposed method has been found to exceed 90% accuracy.

Full Text