This paper tries to attempt a review on deep neural network (DNN) method in music source separation (MSS) tools with emphasis to Spleeter by Deezer, an enhanced deep learning model for music sourceseparation. It is a set of pre-trainedmodel written in python using the Tensorflow machine learning library used for musicsource separation. It was developed by Deezer, on the need to separate a given mixed music track to its constituentinstrumental or vocal tracks usually known as stems. Spleeter offers 3 pre-trainedmodels namely 2, 4, and 5 stemseparation models that are capable of separating a given mix into 2, 4, and 5 stems respectively, which can be used forvarious needs like remixing, up-mixing,music transcription, etc. This paper is the first of its kind to review on DNN methods in MSS.In this paper, we will learn about the purpose and useof Spleeter developed by Deezer as well as about the technical aspect behind this software product that includes areas like ArtificialIntelligence (AI), Machine Learning and Deep Learning, and further about Time-Frequency (TF) masking and U-NetConvolution Neural Network (CNN) which are the methodology and architecture employed in it respectively. From thereview, we learned that Spleeter by Deezer is one of the latest advancement in MSS problem that comparatively has one of the best signal to distortion ratio (SDR), signal to artifacts ratio (SAR), signal to interference ratio (SIR), and sourceimage to spatial distortion ratio (ISR) and produce a state of the art solution, and it has also paved a way togreater development in MSS problem in the future.
Read full abstract