Feature extraction and normalization for speech recognition

Eiko Yamada

doi:10.1121/1.424191

Abstract

Speech data is converted into logarithmic spectrum data and orthogonally transformed to develop feature vectors. Normalization coefficient data and unit vector data are stored. An inner product of the feature vector data and the unit vector data is calculated. The inner product may be the average of inner products for a word or a sentence, or may be a regressive average of them. A normalization vector, which corresponds to a second or higher order curve obtained by least-square error approximation of the speech data on logarithmic spectrum space, is calculated on the transformed feature vector space by using the inner product, the normalization coefficient data, and the unit vector data. Normalization of the feature vectors is performed by subtracting the normalization vector from the feature vectors on the transformed feature vector space. Then, a recognition is performed based on the normalized feature vector.

Full Text