Abstract

Recently proposed model-based methods use time-frequency (T-F) masking for source separation, where the T-F masks are derived from various cues described by a frequency domain Gaussian mixture model (GMM). These methods work well for separating mixtures recorded in low-to-medium level of reverberation, however, their performance degrades as the level of reverberation is increased. We note that the relatively poor performance of these methods under reverberant conditions can be attributed to the high variance of the frequency-dependent GMM parameter estimates. To address this limitation, a novel bootstrap-based approach is proposed to improve the accuracy of expectation maximization estimates of a frequency-dependent GMM based on an a priori chosen initialization scheme. It is shown how the proposed technique allows us to construct time-frequency masks which lead to improved model-based source separation for reverberant speech mixtures. Experiments and analysis are performed on speech mixtures formed using real room-recorded impulse responses.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call