Abstract

In this paper, we propose a novel deep convo-lutional neural network (DCNN) embedded with our feature extraction module (FEM), for monaural music source separation. The UN et ++ is introduced into our FEM for highly flexible feature fusion. At first, an improved encoder-decoder is designed to preliminarily extract multi-scale features from a magnitude spectrogram of the mixture music. Then we use the FEM to further obtain fine features in different scales, and soft masks are finally generated for the separation of each source. The proposed network can capture the main features of multi-scale spectrogram images and make use of the parameters it has learned. We conducted experiments on the MIR-IK dataset and the DSD100 datasets. Our network achieved outstanding performance on the MIR-1K dataset and acquired competitive results on the DSD100 dataset compared with state-of-the-art methods in singing voice separation and source separation tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call