Abstract

It is well-known that the performance of automatic speech recognition (ASR) systems are easily affected by acoustic mismatch between training and testing conditions. This mismatch is often caused by various kinds of environmental noise or distortion. To reduce the effect of mismatch, feature normalization, feature enhancement, model adaptation, etc. have been studied intensively. Cepstral mean normalization (CMN), mean and variance normalization (MVN) and histogram equalization (HEQ) are well-known methods of feature normalization. Stereo-based piecewise linear compensation for environments (SPLICE) is one of the feature enhancement methods. In this paper, we describe how to combine these methods to effectively improve the robustness of ASR systems. In the experiments performed on the Aurora-2 database, a good combination showed a 41% improvement in word error rate over SPLICE only, and a 25% improvement over the conventional combination of SPLICE and CMN.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call