Abstract

This paper describes a computationally-efficient statistical approach to joint (semi-)blind source separation and dereverberation for multichannel noisy reverberant mixture signals. A standard approach to source separation is to formulate a generative model of a multichannel mixture spectrogram that consists of source and spatial models representing the time-frequency power spectral densities (PSDs) and spatial covariance matrices (SCMs) of source images, respectively, and find the maximum-likelihood estimates of these parameters. A state-of-the-art blind source separation method in this thread of research is fast multichannel nonnegative matrix factorization (FastMNMF) based on the low-rank PSDs and jointly-diagonalizable full-rank SCMs. To perform mutually-dependent separation and dereverberation jointly, in this paper we integrate both moving average (MA) and autoregressive (AR) models that represent the early reflections and late reverberations of sources, respectively, into the FastMNMF formalism. Using a pretrained deep generative model of speech PSDs as a source model, we realize semi-blind joint speech separation and dereverberation. We derive an iterative optimization algorithm based on iterative projection or iterative source steering for jointly and efficiently updating the AR parameters and the SCMs. Our experimental results showed the superiority of the proposed ARMA extension over its AR- or MA-ablated version in a speech separation and/or dereverberation task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call