Application of Unet-SE-Bisru Algorithm for Music Signal Processing in Music Source Separation

Tao Zhang

doi:10.12694/scpe.v25i4.2899

Abstract

At present, the use of time-domain deep learning for end-to-end neural network models has the problem of long training time and poor performance in music source separation. To address this issue, a U-network squeezing excitation bidirectional simple recursive unit model was proposed based on the deep extractor model. Replace Unet SE Bisru with Unet SE Bisru in the following text. This model improves the bidirectional long short-term memory network into a bidirectional simple recurrent unit, and then introduces attention mechanisms in the generalized encoding and decoding layers. The squeezing excitation block is used to selectively extract features based on the type of audio to be separated. Finally, group normalization is added after onedimensional convolution, And its effectiveness was verified. The experimental results show that the signal noise distortion ratio in the improved model is 5.68 decibels compared to the bidirectional simple recursive unit value, which is higher than the 5.55 decibels of bidirectional long short-term memory. After adding the squeezing excitation module, the overall increase is about 0.1-0.5 decibels. In addition, in the model comparison, the three indicators of the improved model with the same number of channels were 5.68 decibels, 5.91 decibels, and 11.28 decibels, respectively, higher than the benchmark model. Compared with other music source classification models, the improved model has better comprehensive separation performance. Although some indicators are lower than the comparison model, the signal noise distortion ratio of drum and bass is 6.11 decibels and 6.36 decibels, which is better than the comparison model. Overall, the improved model has high performance in music source separation for music signal processing and can be effectively applied in practical music source separation.

Full Text