We propose an optimized speech enhancement method that combines acoustic echo reduction, speech dereverberation, and noise reduction in a unified framework. Normally, partial optimization of acoustic echo reduction, speech dereverberation, and noise reduction does not lead to total optimization. A cascade method of multiple functions causes mutual interference between these functions and degrades eventual speech enhancement performance. Unlike cascade methods, the proposed method combines all functions to optimize eventual speech enhancement performance based on a unified framework, which is also robust against the mutual interference problem. With the proposed method, in addition to time-invariant linear filters, time-varying filters are used to reduce residual reverberation, residual acoustic echo signal, and background noise signal which cannot be reduced using time-invariant filters. These time-invariant filters and time-varying filters are also optimized based on a unified likelihood function to avoid the mutual interference problem. By combining the time-invariant linear filters and the time-varying filters, the proposed method uses a local Gaussian model with a full-rank covariance matrix and a non-zero average vector as a probabilistic model of the microphone input signal. In the local Gaussian model, non-stationary characteristics of speech sources are considered to effectively enhance speech sources. Under this probabilistic model, all the parameters are optimized simultaneously based on the expectation-maximization algorithm and calculates a minimum mean squared error estimate of a desired signal. The experimental results show that the proposed method is superior to the cascade methods.
Read full abstract