Abstract

This paper presents a feature compensation technique based on the minimum mean square error (MMSE) estimation for robust speech recognition. Similarly to other MMSE compensation methods based on stereo data, our approach models the differences between clean and noisy feature spaces, and the resulting MMSE estimate of the clean feature vector is obtained as a piece-wise linear transformation of the noisy one. However, unlike other well-known MMSE techniques such as SPLICE or MEMLIN, which model the feature spaces with GMMs, in our proposal each feature space is characterized by a set of cells obtained by means of VQ quantization. This VQ-based approach allows a very efficient implementation of the MMSE estimator. Also, the possible degradation inherent to any VQ process is overcome by a strategy based on considering different subregions inside each cell and a subregion-based mean and variance compensation. The experimental results show that, along with a a very efficient MMSE estimator, our technique achieves even better recognition accuracies than SPLICE and MEMLIN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.