Generalized Multichannel Variational Autoencoder for Underdetermined Source Separation

Shogo Seki,Kazuya Takeda,Li Li,Hirokazu Kameoka,Tomoki Toda

doi:10.23919/eusipco.2019.8903054

Abstract

This paper deals with a multichannel audio source separation problem under underdetermined conditions. Multi-channel Non-negative Matrix Factorization (MNMF) is one of the powerful approaches, which adopts the NMF concept for source power spectrogram modeling. It works reasonably well for particular types of sound sources, however, one limitation is that it can fail to work for sources with spectrograms that do not comply with the NMF model. To address this limitation, a novel technique called the Multichannel Variational Autoencoder (MVAE) method was recently proposed, where a Conditional VAE (CVAE) is used instead of the NMF model for source power spectrogram modeling. This approach has shown to perform impressively in determined source separation tasks thanks to the representation power of DNNs. This paper generalizes MVAE originally formulated under determined mixing conditions so that it can also deal with underdetermined cases. The proposed method was evaluated on an underdetermined source separation task of separating out three sources from two microphone inputs. Experimental results revealed that the generalized MVAE method achieved better performance than the conventional MNMF method.

Full Text