Algorithm for separating vocals from polyphonic music

Tuomas Virtanen,Matti RyynäNen

doi:10.1121/1.2942661

Abstract

An algorithm for the separation of vocals from polyphonic music is described. The algorithm consists of two stages, which first estimate the predominant melody line, and then the sinusoidal modeling parameters corresponding to the melody line. The melody line is estimated using a hidden Markov model where the output of a multiple fundamental frequency estimator is used as a feature set. The states of the hidden Markov model correspond to musical notes having different fundamental frequencies, and the state transition probabilities are determined by a musicological model. The sinusoidal modeling stage estimates the frequency, amplitude, and phase of each overtone of the predominant melody line in each frame. The sinusoidal modeling stage can also include mechanisms which reduce the effect of interfering sound sources on the estimated parameters. The resulting separation algorithm is independent of singer identity, musical genre, or instrumentation. Simulation experiments on real polyphonic music show that the algorithm enables separation of vocals from the accompaniment, providing robust results on various musical genres.

Full Text