Monaural Music Source Separation Using Deep Convolutional Neural Network Embedded with Feature Extraction Module

Yongbin Yu,Qian Tang,Chenhui Peng,Xiangxiang Wang

doi:10.1109/cacml55074.2022.00098

Abstract

In this paper, we propose a novel deep convo-lutional neural network (DCNN) embedded with our feature extraction module (FEM), for monaural music source separation. The UN et ++ is introduced into our FEM for highly flexible feature fusion. At first, an improved encoder-decoder is designed to preliminarily extract multi-scale features from a magnitude spectrogram of the mixture music. Then we use the FEM to further obtain fine features in different scales, and soft masks are finally generated for the separation of each source. The proposed network can capture the main features of multi-scale spectrogram images and make use of the parameters it has learned. We conducted experiments on the MIR-IK dataset and the DSD100 datasets. Our network achieved outstanding performance on the MIR-1K dataset and acquired competitive results on the DSD100 dataset compared with state-of-the-art methods in singing voice separation and source separation tasks.

Full Text