Abstract

In this paper we present a harmonic constrained Multichannel Non-Negative Matrix Factorization (MNMF) method for the task of blind music source separation. In this model, the mixing filter encodes the spatial information in terms of magnitude and phase differences between channels whereas the source variances are modelled using a harmonic constrained NMF structure. In this work, the spatial covariance matrix is obtained from the constant-Q transform to account for the frequency logarithmic scale inherent in music signals and reduce the dimensionality of the parameters. Moreover, to mitigate the strong sensitivity to parameter initialization, we propose to initialize the spatial weights with the output of the steered response power (SRP) with the phase transform (PHAT) algorithm. The proposed method has been evaluated for the task of music source separation using a multichannel classical chamber music dataset with several polyphony and reverberation setups. Furthermore, a comparison with other state-of-the-art signal decomposition methods has been accomplished showing reliable results in terms of BSS_EVAL metrics.

Highlights

  • T HE aim of audio source separation is to segregate constituent sound sources from an audio signal mixture

  • Since most of the music audio is available in the form of mixtures, there are several applications of a system capable of music source separation – e.g. automatic creation of karaoke, acoustic emphasis, music transcription, music unmixing and remixing, music production and education purposes

  • In far-field case the interchannel level differences (ILDs) are practically negligible and, spatial information can only be exploited using interchannel phase differences (IPDs)

Read more

Summary

INTRODUCTION

T HE aim of audio source separation is to segregate constituent sound sources from an audio signal mixture. A typical approach consists of decomposing a time-frequency representation of the mixture signal using methods such as non-negative matrix factorization (NMF), independent component analysis (ICA), or probabilistic latent component analysis (PLCA). In far-field case (i.e. when the microphone array size is much smaller than the distances between the sources and microphones) the ILDs are practically negligible and, spatial information can only be exploited using IPDs. Multichannel non-negative matrix factorization (MNMF) based approaches model the latent source magnitude- or power-spectrograms with NMF while the spatial mixing system is modeled using a Gaussian probabilistic modeling applied directly to the complex-valued STFTs of all channels [19, 20, 21]. We present a blind music source separation approach based on MNMF where the signal model is constrained to be harmonic.

PROBLEM SPECIFICATION
MULTICHANNEL NMF
BEAMFORMING INSPIRED DOA-SCM MODEL
PROPOSED MULTICHANNEL HARMONIC MULTI-EXCITATION MODEL
PARAMETER ESTIMATION
EXPERIMENTAL RESULTS AND DISCUSSION
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call