Enhanced feature network for monaural singing voice separation

Weitao Yuan,Boxin He,Shengbei Wang,Jianming Wang,Masashi Unoki

doi:10.1016/j.specom.2018.11.004

Abstract

Deep Recurrent Neural Network (DRNN) based monaural singing voice separation (MSVS) methods have recently obtained impressive separation results. Most of DRNN based methods directly take the magnitude spectra of the mixture signal as the input feature, which has high dimensionality and contains redundant information. The DRNN based models, however, cannot extract the effective low-dimensional and de-redundant representations from the magnitude spectra. In this paper, we propose an Enhanced Feature Network (EFN) to extract effective representations of the magnitude spectra, i.e., enhanced-feature, for MSVS. The generation of enhanced-feature includes two consecutive stages: (i) modeling the local and contextual information explicitly by Convolutional Neural Network (CNN); (ii) extracting the high-level sequential feature by Highway Network and bi-directional Recurrent Neural Network (RNN). In the first stage, the EFN generates an enhanced-sequence consisting of the high-resolution magnitude spectra and its low-dimensional representations, where the low-dimensional part avoids the high cost of spectra decomposition and the high-resolution part mitigates problems of information loss. In the second stage, the enhanced-sequence is used to extract the enhanced-feature which are more suitable for MSVS. Experiments on the MIR-1K dataset have shown that the enhanced-feature can be used to obtain better separation effects than the magnitude spectra or its low-dimensional representations. The proposed method obtains 0.16–0.31 dB GNSDR gain and 0.48–0.71 dB GSAR gain, as compared with the previously proposed DRNN based methods. Moreover, the separation module of EFN which adopts only one hidden layer of GRU RNN can increase the training speed obviously.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhanced feature network for monaural singing voice separation

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Nov 19, 2018
Citations: 7

Similar Papers

Deep Recurrent Convolutional Neural Network for Remaining Useful Life Prediction
Meng Ma ... Zhu Mao
-
Meng Ma, et. al.Meng Ma ... Zhu Mao
01 Jun 2019
01 Jun 2019

Tunnel boring machine vibration-based deep learning for the ground identification of working faces
Mengbo Liu ... Yanqing Men
Journal of Rock Mechanics and Geotechnical Engineering | VOL. 13
Mengbo Liu, et. al.Mengbo Liu ... Yanqing Men
01 Dec 2021
Journal of Rock Mechanics and Geotechnical Engineering | VOL. 13

Proximal Deep Recurrent Neural Network for Monaural Singing Voice Separation
Weitao Yuan ... Masashi Unoki
-
Weitao Yuan, et. al.Weitao Yuan ... Masashi Unoki
01 May 2019
01 May 2019

Translating Videos to Commands for Robotic Manipulation with Deep Recurrent Neural Networks
Anh Nguyen ... Nikos G Tsagarakis
-
Anh Nguyen, et. al.Anh Nguyen ... Nikos G Tsagarakis
01 May 2018
01 May 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhanced feature network for monaural singing voice separation

Abstract

Talk to us

Similar Papers

More From: Speech Communication