Abstract

The performance of speech recognition systems is adversely affected by mismatch between training and test conditions due to environmental factors. In addition to the case of test data from noisy environments, there are scenarios where the training data itself is noisy. In this study, we propose a series of methods for mismatch compensation between training and test environments, based on our "average eigenspace" approach. These methods are also shown to be effective for non-stationary mismatch conditions. An advantage is that there is no need for explicit adaptation data since the method is applied to incoming test data to find the compensatory transform. We evaluate these approaches on two separate corpora which are collected from realistic car environments: CU-Move and UTDrive. Compared with a baseline system incorporating spectral subtraction, highpass filtering and cepstral mean normalization, we obtain a relative word error rate reduction of 17---26 % by applying the proposed techniques. These methods also result in a dimensionality reduction of the feature vectors allowing for a more compact set of acoustic models in the phoneme space, a property important for automatic speech recognition for small footprint size mobile devices such as cell-phone or PDA's which require ASR in diverse environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call