Abstract
Deep Recurrent Neural Network (DRNN) based monaural singing voice separation (MSVS) methods have recently obtained impressive separation results. Most of DRNN based methods directly take the magnitude spectra of the mixture signal as the input feature, which has high dimensionality and contains redundant information. The DRNN based models, however, cannot extract the effective low-dimensional and de-redundant representations from the magnitude spectra. In this paper, we propose an Enhanced Feature Network (EFN) to extract effective representations of the magnitude spectra, i.e., enhanced-feature, for MSVS. The generation of enhanced-feature includes two consecutive stages: (i) modeling the local and contextual information explicitly by Convolutional Neural Network (CNN); (ii) extracting the high-level sequential feature by Highway Network and bi-directional Recurrent Neural Network (RNN). In the first stage, the EFN generates an enhanced-sequence consisting of the high-resolution magnitude spectra and its low-dimensional representations, where the low-dimensional part avoids the high cost of spectra decomposition and the high-resolution part mitigates problems of information loss. In the second stage, the enhanced-sequence is used to extract the enhanced-feature which are more suitable for MSVS. Experiments on the MIR-1K dataset have shown that the enhanced-feature can be used to obtain better separation effects than the magnitude spectra or its low-dimensional representations. The proposed method obtains 0.16–0.31 dB GNSDR gain and 0.48–0.71 dB GSAR gain, as compared with the previously proposed DRNN based methods. Moreover, the separation module of EFN which adopts only one hidden layer of GRU RNN can increase the training speed obviously.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.