Abstract

In this paper, we present a new approach to HMM adaptation that jointly compensates for additive and convolutive acoustic distortion in environment-robust speech recognition. The hallmark of our new approach is the use of a nonlinear, phase-sensitive model of acoustic distortion that captures phase asynchrony between clean speech and the mixing noise. In the first step of the developed algorithm, both the static and dynamic portions of the noise and channel parameters are estimated in the cepstral domain, using the speech recognizer’s “feedback” information and the vector-Taylor-series linearization technique on the nonlinear phase-sensitive model. In the second step, the estimated noise and channel parameters are used to effectively adapt the static and dynamic portions of the HMM means and variances also using the linearized phase-sensitive acoustic distortion model. In the experimental evaluation using the standard Aurora 2 task, the proposed new algorithm achieves 93.3% accuracy using the clean-trained complex HMM backend as the baseline system for unsupervised HMM adaptation. This reaches the highest performance number in the literature on this task with clean-trained HMM model. The experimental results show that the phase term, which was missing in all previous HMM-adaptation work, contributes significantly to the achieved high recognition accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.