Abstract

The paper reports improvements in speech recognition accuracy by using more sophisticated time analysis as part of the feature selection process. The recognition methodology utilises hidden Markov modelling with continuous density functions. The authors propose using, as speech features, linear transformations of the vector consisting of successive time samples of the cepstrum. Taylor series, the Legendre polynomial transform and the discrete cosine transform share several properties with principal components analysis. These transforms are expected to improve speech recognition accuracy by incorporating higher-order time derivatives (such as the second time derivative) of spectral information while at the same time producing an essentially diagonal covariance. In an experimental evaluation of these ideas, accuracy in speakerindependent recognition of the E-set of the alphabet improved from 55%, with no time varying information, to 68%, with first-order time varying information, and 74%, by including second-order time varying information.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call