Abstract

Music source feature extraction is an important research direction in music information retrieval and music recommendation system. To extract the features of music sources more effectively, the study introduces the jump attention mechanism and combines it with the convolutional attention module. Also, a feature extraction module based on Unet + + and spatial attention module is proposed. In addition, the phase feature information of the mixed music signals is utilized to improve the network performance. Results showed that this model was studied to perform well in music source separation experiments of vocals and accompaniment. For vocal separation on the MIR-1K dataset, the model achieves 11.25 dB, 17.34 dB, and 13.83 dB for each metric, respectively. Meanwhile, for drum separation on the DSD100 dataset, the model achieves a median signal-to-source distortion ratio of 4.36 dB, which is 2.91 dB better than that of the Spectral Hierarchical Network model. For the separation of the bass sound and the human voice, the model's in the separation of bass and human voice, the median distortion ratio of the model is as high as 4.87 dB and 6.09 dB, which is better than that of the Spectral Hierarchical Network model. This indicates the significant performance advantages in feature extraction and separation of music sources, and it has important application values in music production and speech recognition.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.