Abstract
A mathematical framework based on maximum likelihood stochastic matching is proposed to perform feature and model compensation for robust speech recognition. Speech recognition is often formulated as a matching problem between the feature vectors extracted from a test utterance and a set of speech models or patterns obtained from some training corpra. It is well known that a speech recognizer often degrades in performance when the testing data are not acoustically similar to the training data. One way to improve is to find features that are invariant under all acoustic conditions and distortions. Some form of compensation is often required. The proposed stochastic matching approach assumes a structure or a form of the feature and/or model transformations. Together with a set of nuisance parameters, the transformations approximate the distortion in the test utterance. To decrease the acoustic mismatch between a test utterance and a given set of speech models, e.g., hidden Markov models, the stochastic matching algorithm estimates the nuisance parameters and then applies the feature/model transformations during speech recognition. Simple channel distortion can be approximated with linear transformations. For more complicated distortions, such as environmental, speaker, and combined mismatches, nonlinear compensating transformations are needed. These compensations give a significant performance improvement in speech recognition over the systems without them when utterances are affected by additive ambient noises and convolutional channel distortions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.