This paper investigates speech recognition involving additive background noise, assuming no knowledge about the noise characteristics. A new method, namely universal compensation (UC), is proposed as a solution to the problem. The UC method is an extension of the missing-feature method, i.e., recognition based only on reliable data but robust to any corruption type, including full corruption in which the noise affects all time-frequency components of the speech representation. The UC technique achieves robustness to unknown, full noise corruption through a novel combination of the multicondition training method and the missing-feature method. Multicondition training is employed to convert fullband spectral corruption into partial-band spectral corruption, which is achieved by training the model using data involving simulated wide-band noise at different signal-to-noise ratios. The missing-feature principle is employed to reduce the effect of the remaining partial-band corruption on recognition by basing the recognition only on the matched or compensated spectral components from the multicondition training. The combination of these two strategies makes the new method potentially capable of dealing with arbitrary additive noise-with arbitrary temporal-spectral characteristics-based only on clean speech training data and simulated noise data, without requiring knowledge of the actual noise. Two databases, Aurora 2 and an E-set word database, have been used to evaluate the UC method. Experiments on Aurora 2 indicate that the new model has the potential to achieve a recognition performance close to the performance obtained by a multicondition baseline model trained using data involving the test environments. Further experiments for noise conditions unseen in Aurora 2 show significant performance improvement for the new model over the multicondition model. The experimental results on the E-set database demonstrate the ability of the UC model to deal with acoustically confusing recognition tasks.
Read full abstract