Abstract

Monaural singing voice separation has received much attention in recent years. In this paper, we propose a novel neural network architecture for monaural singing voice separation, Fusion-Net, which is combining U-Net with the residual convolutional neural network to develop a much deeper neural network architecture with summation-based skip connections. In addition, we apply time-frequency masking to improve the separation results. Finally, we integrate the phase spectra with magnitude spectra as the post-processing to optimize the separated singing voice from the mixture music. Experimental results demonstrate that the proposed method can achieve better separation performance than the previous U-Net architecture on the ccMixter database.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call