Multiresolution feature extraction (MRFE) based speech recognition system

M Anbu Swarna Priyanka,P Vijayalakshmi,T Nagarajan,V Sherlin Solomi

doi:10.1109/icrtit.2013.6844197

Abstract

A speech recognition system will recognise the speech uttered into text. The accuracy of the recognition system depends on the models generated. Models are trained based on the features extracted from the available training data. These models are used to recognise the spoken text. In the conventional feature extraction method, features are extracted using single window size (say 20ms). Instead of this fixed window size, we propose to extract features using multiple window sizes from the same speech signal. When multiple window sizes are used, multiple sets of feature vectors are derived for the same word thereby increasing the number of examples. Experiments show that when features are extracted with multiple window sizes, the variations among the feature vectors are considerably increased, which will lead to better acoustic models. This multiresolution feature extraction technique is successfully used for building a speech recogniser. To analyse the performance of multiresolution feature extraction, isolated word speech recognition system is developed for the TIMIT speech corpus. Results reveal that around 8% improvement in recognition accuracy is obtained over conventional single resolution feature extraction based method.

Full Text