Persian Music Source Separation in Audio-Visual Data Using Deep Learning

Seyedeh Sogand Hashemi,Masoudreza Aghabozorgi,Mohammad Taghi Sadeghi

doi:10.1109/icspis51611.2020.9349614

Abstract

In applications such as noise removal, speech separation, music reconstruction or integration, and many others, it is important to decompose a single-channel recorded sound into its sources. In this paper, we deal with the separation of single-channel audio sources and cocktail party problem in audio data using supervised learning. The basis of this method is using deep learning to separate audio signals. The data contains audio-visual videos and Persian musical instruments sounds which have been selected as audio sources. The proposed algorithm is the first method implemented based on deep learning to separate the sound sources of Persian musical instruments. In this method, first the audio part of the video signal is extracted, then the audio signal, which is a mixture of sound sources, is separated and decomposed by a deep neural network model. According to the evaluations, the proposed method has good quantitative and qualitative results for separation two audio sources. In addition, this method is a comprehensive solution that can be used for a wide range of audio sources and a combination of more than two audio sources.

Full Text