Many modern smart devices are equipped with a microphone array and a loudspeaker (or are able to connect to one). Acoustic echo cancellation algorithms, specifically their multi-microphone variants, are essential components in such devices. On top of acoustic echos, other commonly encountered interference sources in telecommunication systems are reverberation, which may deteriorate the desired speech quality in acoustic enclosures, specifically if the speaker distance from the array is large, and noise. Although sub-optimal, the common practice in such scenarios is to treat each problem separately. In the current contribution, we address a unified statistical model to simultaneously tackle the three problems. Specifically, we propose a recursive EM (REM) algorithm for solving echo cancellation, dereverberation and noise reduction. The proposed approach is derived in the short-time Fourier transform (STFT) domain, with time-domain filtering approximated by the convolutive transfer function (CTF) model. In the E-step, a Kalman filter is applied to estimate the near-end speaker, based on the noisy and reveberant microphone signals and the echo reference signal. In the M-step, the model parameters, including the acoustic systems, are inferred. Experiments with human speakers were carried out to examine the performance in dynamic scenarios, including a walking speaker and a moving microphone array. The results demonstrate the efficiency of the echo canceller in adverse conditions together with a significant reduction in reverberation and noise. Moreover, the tracking capabilities of the proposed algorithm were shown to outperform baseline methods.
Read full abstract