Singing voice separation with pre-learned dictionary and reconstructed voice spectrogram

Chenghong Yang,Hongjuan Zhang

doi:10.1007/s00521-018-3757-x

Abstract

Recently the mixture spectrogram of a song is usually considered as a superposition of a sparse spectrogram and a low-rank spectrogram, which correspond to the vocal part and the accompaniment part of the song, respectively. Based on this observation, one can separate singing voice from the background music. However, the quality of such separation might be limited, since the vocal part may be not described very well by low rank, and moreover its more prior information, such as annotation, should be considered when designing separation algorithm. Based on these considerations, in this paper, we present two categories, time–frequency-based source separation algorithms. Specifically, one incorporates both the vocal and instrumental spectrograms as sparse matrix and low-rank matrix, meanwhile combines some side information of vocal part, i.e., the reconstructed voice spectrogram from the annotation. The others further consider both the vocal and instrumental spectrograms as sparse matrix and group-sparse matrix, respectively. Evaluations on the iKala dataset show that the proposed methods are effective and efficient for both the separated singing voice and music accompaniment.

Full Text