Group delay based music source separation using deep recurrent neural networks

Jilt Sebastian,Hema A Murthy

doi:10.1109/spcom.2016.7746672

Abstract

Deep Recurrent Neural Networks (DRNNs) have been most successfully used in solving the challenging task of separating sources from a single channel acoustic mixture. Conventionally, magnitude spectra are being used to learn the characteristics of individual sources in such monaural blind source separation (BSS) task. The phase spectra which inherently contain the timing information is often ignored. In this work, we explore the use of modified group delay (MOD-GD) function for learning the time-frequency masks of the sources in the monaural BSS problem. We demonstrate the use of MOD-GD through two music source separation tasks: singing voice separation on the MIR-1K data set and vocal-violin separation on the Carnatic music data set. We find that it outperforms the state-of-the-art feature in terms of Signal to Interference Ratio (SIR). Moreover, training and testing times are significantly reduced (by 50%) without compromising on the performance for the best performing DRNN configuration.

Full Text