Stereo channel music signal separation based on non-negative tensor factorization with cepstrum regularization

Shogo Seki,Kazuya Takeda,Tomoki Toda,Kento Ohtani

doi:10.1121/1.4969177

Abstract

Music signals are usually generated by mixing many music source signals, such as various instrumental sounds and vocal sounds, and they are often represented as 2-channel signals (i.e., stereo channel signals). Underdetermined source separation for separating the music signals into individual music source signals is a potential technique to develop various applications, such as music transcription, singer discrimination, and vocal extraction. One of the most powerful underdetermined source separation methods is Nonnegative Matrix Factorization (NMF) that models a power spectrogram of an observation signal as a product of two nonnegative matrices; basis and activation matrices. To apply NMF to the stereo channel music signal separation, we have proposed Nonnegative Tensor Factorization (NTF) by further implementing a gain matrix to represent mixing (i.e., panning) information. However, the separation performance of this method is insufficient owing to less prior information to model acoustic characteristics of the individual music source signals. To address this issue, we propose a cepstrum regularization method for NTF to make power spectral envelopes of the separated signals close to those of individual music source signals. We conduct experimental evaluations to investigate the effectiveness of the regularization method and show remaining problems to be addressed.

Full Text