Abstract

The performance of speech recognition systems is adversely affected by mismatch in training and testing environmental conditions. In addition to test data from noisy environments, there are scenarios where the training data itself is noisy. Speech enhancement techniques which solely focus on finding a clean speech estimate from the noisy signal are not effective here. Model adaptation techniques may also be ineffective due to the dynamic nature of the environment. In this paper, we propose a method for mismatch compensation between training and testing environments using the ”average eigenspace” approach when the mismatch is non-stationary. There is no need for explicit adaptation data as the method works on incoming test data to find the compensatory transform. This method is different from traditional signal-noise subspace filtering techniques where the dimensionality of the clean signal space is assumed to be less than the noise space and noise affects all dimensions to the same extent. We evaluate this approach on two corpora which are collected from real car environments: CU-Move and UTDrive. Using Sphinx, a relative reduction of 40-50% is achieved in WER compared to the baseline system. The method also results in a reduction in the dimensionality of the feature vectors allowing for a more compact set of acoustic models in the phoneme space.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.