Abstract

We analyze the performance of continuous speech recognition of a speaker independent system using Hidden Markov Model and Artificial Neural Network. Modern speech recognition systems use different combinations of the standard techniques over the basic approach to improve performance accuracy. One such combination which has gained more attention is the hybrid model. Our hybrid system for continuous speech recognition consists of a combination of Hidden Markov Model in the front end and the Neural Network with Radial basis function as the back end. The speech recognition process consists of the training phase and the recognition phase. The speech sentences are pre-processed and the features are extracted. The extracted feature vector is clustered into a model database by Hidden Markov Model and is trained by the Radial Basis Function Neural Network. During the recognition phase, the continuous sentence is pre-processed and its feature vector is modelled. This is compared with the database model which contains models stored during the training process. When a match occurs, the model is recognized and the recognition is made for the least error. From the recognized output the word error rate is computed, which is a measure of recognition performance of the hybrid model. The audio files of continuous sentences are taken from the TIMIT database. The performance of our hybrid HMM/RBFNN gives 65% recognition rate.

Highlights

  • IntroductionAccuracy for speaker independent continuous speech are not so apparent

  • The performance of the hybrid model Hidden Markov Models (HMM)/Radial Basis Function Neural Networks (RBFNN) was analyzed with continuous speech sentences taken from the TIMIT database

  • This study is done with a continuous sentence, speaker independent, large vocabulary speech recognition system using a hybrid model using HMM at the front end and the radial basis function neural networks as a classifier

Read more

Summary

Introduction

Accuracy for speaker independent continuous speech are not so apparent. Another difficulty faced is when recognition from large vocabulary sized speech corpus the task is speaker independent. The vector comprises of Cepstral coefficients obtained by taking the FFT of a short time window of speech and taking the most significant coefficients. In each state it tends to have a statistical distribution that is a mixture of diagonal covariance Gaussians which will give likelihood for each observed vector. A large vocabulary system gives a reduced accuracy -Word Error Rate

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call