Modified Mean and Variance Normalization: Transforming to Utterance-Specific Estimates

Vikas Joshi,N Vishnu Prasad,S Umesh

doi:10.1007/s00034-015-0129-y

Vikas Joshi, N Vishnu Prasad + Show 1 more

https://doi.org/10.1007/s00034-015-0129-y

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Cepstral mean and variance normalization (CMVN) is an efficient noise compensation technique popularly used in many speech applications. CMVN eliminates the mismatch between training and test utterances by transforming them to zero mean and unit variance. In this work, we argue that some amount of useful information is lost during normalization as every utterance is forced to have the same first- and second-order statistics, i.e., zero mean and unit variance. We propose to modify CMVN methodology to retain the useful information and yet compensate for noise. The proposed normalization approach transforms every test utterance to utterance-specific clean mean (i.e., utterance mean if the noise was absent) and clean variance, instead of zero mean and unit variance. We derive expressions to estimate the clean mean and variance from a noisy utterance. The proposed normalization is effective in the recognizing voice commands that are typically short (single words or short phrases), where more advanced methods [such as histogram equalization (HEQ)] are not effective. Recognition results show a relative improvement (RI) of $$21\,\%$$21% in word error rate over conventional CMVN on the Aurora-2 database and a RI of 20 and $$11\,\%$$11% over CMVN and HEQ on short utterances of the Aurora-2 database.

Full Text