Orthogonal transformations of stacked feature vectors applied to HMM speech recognition

Michael J Flaherty ,David B Roe

doi:10.1049/ip-i-2.1993.0017

Abstract

The paper reports improvements in speech recognition accuracy by using more sophisticated time analysis as part of the feature selection process. The recognition methodology utilises hidden Markov modelling with continuous density functions. The authors propose using, as speech features, linear transformations of the vector consisting of successive time samples of the cepstrum. Taylor series, the Legendre polynomial transform and the discrete cosine transform share several properties with principal components analysis. These transforms are expected to improve speech recognition accuracy by incorporating higher-order time derivatives (such as the second time derivative) of spectral information while at the same time producing an essentially diagonal covariance. In an experimental evaluation of these ideas, accuracy in speakerindependent recognition of the E-set of the alphabet improved from 55%, with no time varying information, to 68%, with first-order time varying information, and 74%, by including second-order time varying information.

Full Text