Abstract

This paper deals with a multichannel audio source separation problem under underdetermined conditions. Multichannel non-negative matrix factorization (MNMF) is a powerful method for underdetermined audio source separation, which adopts the NMF concept to model and estimate the power spectrograms of the sound sources in a mixture signal. This concept is also used in independent low-rank matrix analysis (ILRMA), a special class of the MNMF formulated under determined conditions. While these methods work reasonably well for particular types of sound sources, one limitation is that they can fail to work for sources with spectrograms that do not comply with the NMF model. To address this limitation, an extension of ILRMA called the multichannel variational autoencoder (MVAE) method was recently proposed, where a conditional VAE (CVAE) is used instead of the NMF model for expressing source power spectrograms. This approach has performed impressively in determined source separation tasks thanks to the representation power of deep neural networks. While the original MVAE method was formulated under determined mixing conditions, this paper proposes a generalized version of it by combining the ideas of MNMF and MVAE so that it can also deal with underdetermined cases. We call this method the generalized MVAE (GMVAE) method. In underdetermined source separation and speech enhancement experiments, the proposed method performed better than baseline methods.

Highlights

  • Blind source separation (BSS) refers to the problem of separating out underlying source signals present in observed mixture signals received by a microphone array

  • While the original multichannel variational autoencoder (MVAE) method was formulated under determined mixing conditions, we propose a generalized version of the original MVAE method by combining the ideas of Multichannel non-negative matrix factorization (MNMF) and the MVAE method so that it can deal with underdetermined cases

  • We proposed the generalized MVAE (GMVAE) method, a generalized version of the MVAE method that can deal with underdetermined cases

Read more

Summary

INTRODUCTION

Blind source separation (BSS) refers to the problem of separating out underlying source signals present in observed mixture signals received by a microphone array. It is worthwhile to note that the optimization algorithms for MNMF and ILRMA are guaranteed to converge to a stationary point, and work reasonably well for some types of sound sources They can fail to work when encountering sound sources with spectrograms that do not follow the NMF model, resulting in performance limitations. Another VAE-based method worth noting is the multichannel VAE (MVAE) method [14], [15] This method is an extension of ILRMA with the difference being that a conditional VAE (CVAE) [20] instead of the NMF model is used as a generative model of source spectrograms. While the original MVAE method was formulated under determined mixing conditions, we propose a generalized version of the original MVAE method by combining the ideas of MNMF and the MVAE method so that it can deal with underdetermined cases. Note that this paper is an extended journal version of our preprint paper [18] and conference paper [19]

PROBLEM FORMULATION
DEEP NEURAL NETWORK APPROACH
VAE-NMF
ADVANTAGES OVER RELATED WORK
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call