Abstract

Noise compensation techniques for robust automatic speech recognition (ASR) attempt to improve system performance in the presence of acoustic interference. In feature-based noise compensation, which includes speech enhancement approaches, the acoustic features that are sent to the recognizer are first processed to remove the effects of noise (see Chapter 9). Model compensation approaches, in contrast, are concerned with modifying and even extending the acoustic model of speech to account for the effects of noise. A taxonomy of the different approaches to noise compensation is depicted in Figure 12.1, which serves as a road map for the present discussion. The two main strategies used for model compensation approaches are model adaptation and model-based noise compensation. Model adaptation approaches implicitly account for noise by adjusting the parameters of the acoustic model of speech, whereas model-based noise compensation approaches explicitly model the noise and its effect on the noisy speech features. Common adaptation approaches include maximum likelihood linear regression (MLLR) [56], maximum a posteriori (MAP) adaptation [32], and their generalizations [17, 29, 47]. These approaches, which are discussed in Chapter 11, alter the speech acoustic model in a completely data-driven way given additional training data or test data. Adaptation methods are somewhat more general than model-based approaches in that they may handle effects on the signal that are difficult to explicitly model, such as nonlinear distortion and changes in the voice in reaction to noise (the Lombard effect [53]). However, in the presence of additive noise, failing to take take into account the known interactions between speech and noise can be detrimental to performance. Model-based noise compensation approaches, in contrast to adaptation approaches, explicitly model the different factors present in the acoustic environment: the speech, the various sources of acoustic interference, and how they interact to form the noisy speech

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call