Channel distortion is one of the major factors which degrade the performances of automatic speech recognition (ASR) systems. Current compensation methods are generally based on the assumption that the channel distortion is a constant or slowly varying bias in an utterance or globally. However, this assumption is not sustained in a more complex circumstance, when the speech records being recognized are from many different unknown channels and have parts of the spectrum completely removed (e.g. band-limited speech). On the one hand, different channels may cause different distortions; on the other, the distortion caused by a given channel varies over the speech frames when parts of the speech spectrum are removed completely. As a result, the performance of the current methods is limited in complex environments. To solve this problem, we propose a unified framework in which the channel distortion is first divided into two subproblems, namely, spectrum missing and magnitude changing. Next, the two types of distortions are compensated with different techniques in two steps. In the first step, the speech bandwidth is detected for each utterance and the acoustic models are synthesized with clean models to compensate for spectrum missing. In the second step, the constant term of the distortion is estimated via the expectation-maximization (EM) algorithm and subtracted from the means of the synthesized model to further compensate for magnitude changing. Several databases are chosen to evaluate the proposed framework. The speech in these databases is recorded in different channels, including various microphones and band-limited channels. Moreover, to simulate more types of spectrum missing, various low-pass and band-pass filters are used to process the speech from the chosen databases. Although these databases and their filtered versions make the channel conditions more challenging for recognition, experimental results show that the proposed framework can substantially improve the performance of ASR systems in complex channel environments.
Read full abstract