Abstract

A speech recognition system will recognise the speech uttered into text. The accuracy of the recognition system depends on the models generated. Models are trained based on the features extracted from the available training data. These models are used to recognise the spoken text. In the conventional feature extraction method, features are extracted using single window size (say 20ms). Instead of this fixed window size, we propose to extract features using multiple window sizes from the same speech signal. When multiple window sizes are used, multiple sets of feature vectors are derived for the same word thereby increasing the number of examples. Experiments show that when features are extracted with multiple window sizes, the variations among the feature vectors are considerably increased, which will lead to better acoustic models. This multiresolution feature extraction technique is successfully used for building a speech recogniser. To analyse the performance of multiresolution feature extraction, isolated word speech recognition system is developed for the TIMIT speech corpus. Results reveal that around 8% improvement in recognition accuracy is obtained over conventional single resolution feature extraction based method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.