Abstract
Compared with single-channel speech enhancement methods, multichannel methods can utilize spatial information to design optimal filters. Although some filters adaptively consider second-order signal statistics, the temporal evolution of the speech spectrum is usually neglected. By using linear prediction (LP) to model the inter-frame temporal evolution of speech, single-channel Kalman filtering (KF) based methods have been developed for speech enhancement. In this paper, we derive a multichannel KF (MKF) that jointly uses both interchannel spatial correlation and interframe temporal correlation for speech enhancement. We perform LP in the modulation domain, and by incorporating the spatial information, derive an optimal MKF gain in the short-time Fourier transform domain. We show that the proposed MKF reduces to the conventional multichannel Wiener filter if the LP information is discarded. Furthermore, we show that, under an appropriate assumption, the MKF is equivalent to a concatenation of the minimum variance distortion response beamformer and a single-channel modulation-domain KF and therefore present an alternative implementation of the MKF. Experiments conducted on a public head-related impulse response database demonstrate the effectiveness of the proposed method.
Highlights
I NTERFERENCE from environmental noise brings great challenges to speech processing systems in speech communication, hearing aids, and automatic speech recognition
We show that the proposed multichannel KF (MKF) reduces to the conventional multichannel Wiener filter (MWF) if the linear prediction (LP) information is discarded and show that, under appropriate conditions, the MKF is equivalent to a minimum variance distortion response (MVDR) beamformer followed by a single-channel modulation-domain Kalman filtering (MDKF)
A modulation-domain MKF is proposed in this paper for multichannel speech enhancement
Summary
I NTERFERENCE from environmental noise brings great challenges to speech processing systems in speech communication, hearing aids, and automatic speech recognition. It is shown that, the GSC is equivalent to the MVDR beamformer by using the DS in the beamforming stage [19], [20] Another category of multichannel speech enhancement algorithms is multichannel Wiener filtering (MWF) [21]–[23], which can operate without explicit knowledge of the steering vector or RTF, and estimates the target signal under the minimum mean squared error (MMSE) criterion. In the single-channel modulation-domain Kalman filtering (MDKF) methods [31]–[39], a modulationdomain state vector is defined to represent the amplitude estimation of the clean speech. We show that the proposed MKF reduces to the conventional multichannel Wiener filter (MWF) if the LP information is discarded and show that, under appropriate conditions, the MKF is equivalent to a MVDR beamformer followed by a single-channel MDKF.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have