Monaural Singing Voice Separation Using Fusion-Net with Time-Frequency Masking

Feng Li,Masato Akagi,Kaizhi Qian,Mark Hasegawa-Johnson

doi:10.1109/apsipaasc47483.2019.9023055

Monaural Singing Voice Separation Using Fusion-Net with Time-Frequency Masking

Feng Li, Masato Akagi + Show 2 more

https://doi.org/10.1109/apsipaasc47483.2019.9023055

Copy DOI

Publication Date: Nov 1, 2019

Affiliation: Japan Advanced Institute of Science and Technology, University of Illinois Urbana-Champaign

#Monaural Singing Voice Separation #Singing Voice + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

Monaural singing voice separation has received much attention in recent years. In this paper, we propose a novel neural network architecture for monaural singing voice separation, Fusion-Net, which is combining U-Net with the residual convolutional neural network to develop a much deeper neural network architecture with summation-based skip connections. In addition, we apply time-frequency masking to improve the separation results. Finally, we integrate the phase spectra with magnitude spectra as the post-processing to optimize the separated singing voice from the mixture music. Experimental results demonstrate that the proposed method can achieve better separation performance than the previous U-Net architecture on the ccMixter database.

Full Text