Abstract

A speaker-independent isolated word recognizer is proposed. It is obtained by concatenating a Bayesian neural network and a Hopfield time-alignment network. In this system, the Bayesian network outputs the a posteriori probability for each speech frame, and the Hopfield network is then concatenated for time warping. A proposed splitting Learning Vector Quantization (LVQ) algorithm derived from the LBG clustering algorithm and the Kohonen LVQ algorithm is first used to train the Bayesian network. The LVQ2 algorithm is subsequently adopted as a final refinement step. A continuous mixture of Gaussian densities for each frame and multi-templates for each word are employed to characterize each word pattern. Experimental evaluation of this system with four templates/word and five mixtures/frame, using 53 speakers (28 males, 25 females) and isolated words (10 digits and 30 city names) databases, gave average recognition accuracies of 97.3%, for the speaker-trained mode and 95.7% for the speaker-independent mode, respectively. Comparisons with K-means and DTW algorithms show that the integration of the splitting LVQ and LVQ2 algorithms makes this system well suited to speaker-independent isolated word recognition. A cookbook approach for the determination of parameters in the Hopfield time-alignment network is also described.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call