Abstract

Blind source separation exploiting multichannel information has long been a popular topic, and recently proposed methods based on the local Gaussian model have shown promising results despite its high computational cost for the case of many microphone signals. The low updating speed for such a model is mainly due to the inversion of a spatial covariance matrix, for which the complexity increases with the number of microphones, M, and is generally of order O(M <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> ). Several projection-based approaches that attempt to concentrate energy on the diagonal part of the spatial covariance matrix have been introduced to circumvent the matrix inversion, which can reduce the complexity to O(M). In this article, we focus on the fast Fourier transform as a projection method because the energy concentration on the diagonal can be efficiently achieved compared with other projection-based methods. For the case where the diagonalization is imperfect, for example, owing to discontinuities at the edge of a linear array, we also developed a more robust algorithm approximating the tri-diagonal part of the spatial covariance matrix, which requires a complexity of O(M <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ) for the inversion by applying the Thomas algorithm. To remove the ad-hoc integration of post clustering after the decomposition, we also examine a self-clustering algorithm. Our evaluation shows better results than other previously proposed methods in terms of the separation quality under reverberant conditions as well as higher efficiency than multichannel non-negative matrix factorization.

Highlights

  • M ULTICHANNEL music source separation is one of the most actively studied topics in the audio signal processing field and various approaches have been proposed to tackle this difficult problem

  • In 2005 the local Gaussian model was first applied to multichannel source separation [7], [8], in which the spectrum of each time-frequency bin is modeled as an instantaneous mixture of complex multivariate Gaussians

  • Ozerov and Févotte applied a low-rank factorization in this framework for modeling source amplitudes of time-frequency bins [11]. Their approach can be regarded as the multichannel extension of the well-known non-negative matrix factorization (NMF) [12]

Read more

Summary

Introduction

M ULTICHANNEL music source separation is one of the most actively studied topics in the audio signal processing field and various approaches have been proposed to tackle this difficult problem. In 2005 the local Gaussian model was first applied to multichannel source separation [7], [8], in which the spectrum of each time-frequency bin is modeled as an instantaneous mixture of complex multivariate Gaussians. Ozerov and Févotte applied a low-rank factorization in this framework for modeling source amplitudes of time-frequency bins [11]. Their approach can be regarded as the multichannel extension of the well-known non-negative matrix factorization (NMF) [12]. For the convergence of multichannel NMF, GEM-based parameter updates were shown to be much slower than multiplicative updates by comparison with non-negative tensor factorization (NTF) [20]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call