Abstract

A method for speaker independent isolated digit recognition based on modeling entire words as discrete probabilistic functions of a Markov process is described. Training is a three‐part process comprising conventional methods of linear prediction analysis and vector quantization of the LPCs followed by an algorithm [L. E. Baum, Inequalities 3, 1–8 (1972)] for estimating the parameters of a hidden Markov process. Recognition utilizes linear prediction and vector quantization steps prior to maximum likelihood classification based on the Viterbi algorithm [A. J. Viterbi, IEEE Trans. Inf. Theo. IT‐13, 260–269 (1967)]. After training based on a 1000‐token set, recognition experiments were conducted on a separate 1000‐token test set obtained from 100 new talkers. In this test a 3.5% error rate was observed which is comparable to that measured in an identical test of an LPC/DTW system [L. R. Rabiner et al., IEEE Trans. Acoust. Speech Signal Process. ASSP‐37, 336–349 (1979)]. The computational demand for recognition under the new system is reduced by a factor of approximately ten in both time and memory compared to that of the LPC/DTW system. Issues of selection of model structures, averaging techniques to obtain model stability, and methods of compensating for finite training set size are also discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call