Abstract

We introduce a new approach, called non-stationary adaptation (NA), to recognize speech under non-stationary adverse environments. Two models are used: one is a speaker-independent hidden Markov model (HMM) for clean speech, the other is an ergodic Markov chain representing the non-stationary adverse environment. Each state in the Markov chain represents one stationary adverse condition and has associated with it an affine transform that is estimated by maximum likelihood linear regression (MLLR). Three kinds of adverse environments are considered: (i) multi-speaker speech recognition where the speaker identity changes randomly and this constitutes a non-stationary adverse condition, (ii) the recognition of speech corrupted by machinegun noise, and (iii) the crosstalk problem. The algorithm is tested on the Nov92 development database of WSJF0 with a vocabulary size of 20000. In multi-speaker speech recognition, NA decreases the error rate by 13.6%. For speech corrupted by machinegun noise, a one-state Markov chain decreases the error rate by 18%, and a two-state Markov chain gives another 14% decrease in error rate. In the crosstalk problem, a one-state Markov chain decreases the error rate by 16.8%. Two-state and three-state Markov chains decrease the error rate by 22% and 24.4%, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call