Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain

Yuki Mitsufuji,Stefan Uhlich,Hiroshi Saruwatari,Norihiro Takamune,Daichi Kitamura,Shoichi Koyama

doi:10.1109/taslp.2019.2948770

Abstract

Blind source separation exploiting multichannel information has long been a popular topic, and recently proposed methods based on the local Gaussian model have shown promising results despite its high computational cost for the case of many microphone signals. The low updating speed for such a model is mainly due to the inversion of a spatial covariance matrix, for which the complexity increases with the number of microphones, M, and is generally of order O(M <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> ). Several projection-based approaches that attempt to concentrate energy on the diagonal part of the spatial covariance matrix have been introduced to circumvent the matrix inversion, which can reduce the complexity to O(M). In this article, we focus on the fast Fourier transform as a projection method because the energy concentration on the diagonal can be efficiently achieved compared with other projection-based methods. For the case where the diagonalization is imperfect, for example, owing to discontinuities at the edge of a linear array, we also developed a more robust algorithm approximating the tri-diagonal part of the spatial covariance matrix, which requires a complexity of O(M <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) for the inversion by applying the Thomas algorithm. To remove the ad-hoc integration of post clustering after the decomposition, we also examine a self-clustering algorithm. Our evaluation shows better results than other previously proposed methods in terms of the separation quality under reverberant conditions as well as higher efficiency than multichannel non-negative matrix factorization.

Highlights

M ULTICHANNEL music source separation is one of the most actively studied topics in the audio signal processing field and various approaches have been proposed to tackle this difficult problem
In 2005 the local Gaussian model was first applied to multichannel source separation [7], [8], in which the spectrum of each time-frequency bin is modeled as an instantaneous mixture of complex multivariate Gaussians
Ozerov and Févotte applied a low-rank factorization in this framework for modeling source amplitudes of time-frequency bins [11]. Their approach can be regarded as the multichannel extension of the well-known non-negative matrix factorization (NMF) [12]

Summary

Introduction

M ULTICHANNEL music source separation is one of the most actively studied topics in the audio signal processing field and various approaches have been proposed to tackle this difficult problem. In 2005 the local Gaussian model was first applied to multichannel source separation [7], [8], in which the spectrum of each time-frequency bin is modeled as an instantaneous mixture of complex multivariate Gaussians. Ozerov and Févotte applied a low-rank factorization in this framework for modeling source amplitudes of time-frequency bins [11]. Their approach can be regarded as the multichannel extension of the well-known non-negative matrix factorization (NMF) [12]. For the convergence of multichannel NMF, GEM-based parameter updates were shown to be much slower than multiplicative updates by comparison with non-negative tensor factorization (NTF) [20]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Nov 1, 2019
Citations: 52	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

Joint-diagonalizability-constrained multichannel nonnegative matrix factorization based on time-variant multivariate complex sub-Gaussian distribution
Keigo Kamo ... Kazunobu Kondo
Signal Processing | VOL. 188
Keigo Kamo, et. al.Keigo Kamo ... Kazunobu Kondo
18 Jun 2021
Signal Processing | VOL. 188

Joint-Diagonalizability-Constrained Multichannel Nonnegative Matrix Factorization Based on Multivariate Complex Sub-Gaussian Distribution
Keigo Kamo ... Hiroshi Saruwatari
-
Keigo Kamo, et. al.Keigo Kamo ... Hiroshi Saruwatari
24 Jan 2021
24 Jan 2021

Fast Multichannel Nonnegative Matrix Factorization With Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation
Kouhei Sekiguchi ... Aditya Arie Nugraha
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28
Kouhei Sekiguchi, et. al.Kouhei Sekiguchi ... Aditya Arie Nugraha
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28

Regularized Fast Multichannel Nonnegative Matrix Factorization with ILRMA-Based Prior Distribution of Joint-Diagonalization Process
Keigo Kamo ... Yu Takahashi
-
Keigo Kamo, et. al.Keigo Kamo ... Yu Takahashi
01 May 2020
01 May 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing