Main instrument separation from stereophonic audio signals using a source/filter model

Jean-Louis Durrieu ,Alexey Ozerov ,Cédric Févotte ,Gael Richard ,Bertrand David

doi:10.5281/zenodo.41644

Abstract

We propose a new approach to solo/accompaniment separation from stereophonic music recordings which extends a monophonic algorithm we recently proposed. The solo part is modelled using a source/filter model to which we added two contributions: an explicit smoothing strategy for the filter frequency responses and an unvoicing model to catch the stochastic parts of the solo voice. The accompaniment is modelled as a general instantaneous mixture of several components leading to a Nonnegative Matrix Factorization framework. The stereophonic signal is assumed to be the instantaneous mixture of the solo and accompaniment contributions. Both channels are then jointly used within a Maximum Likelihood framework to estimate all the parameters. Three rounds of parameter estimations are necessary to sequentially estimate the melody, the voiced part and at last the unvoiced part of the solo. Our tests show that there is a clear improvement from a monophonic reference system to the proposed stereophonic system, especially when including the unvoicing model. The smoothness of the filters does not provide the desired improvement in solo/accompaniment separation, but may be useful in future applications such as lyrics recognition. At last, our submissions to the Signal Separation Evaluation Campaign (SiSEC), for the “Professionally Produced Music Recordings” task, obtained very good results.

Full Text