On the application of deep learning and multifractal techniques to classify emotions and instruments using Indian Classical Music

Sayan Nag,Medha Basu,Shankha Sanyal,Archi Banerjee,Dipak Ghosh

doi:10.1016/j.physa.2022.127261

Abstract

Music is often considered as the language of emotions. The way it stimulates the emotional appraisal across people from different communities, culture and demographics has long been known and hence categorizing on the basis of emotions is indeed an intriguing basic research area. Indian Classical Music (ICM) is famous for its ambiguous nature, i.e. its ability to evoke a number of mixed emotions through only a single musical narration, and hence classifying evoked emotions from ICM becomes a more challenging task. With the rapid advancements in the field of Deep Learning, this Music Emotion Recognition (MER) task is becoming more and more relevant and robust, hence can be applied to one of the most challenging test case i.e. classifying emotions elicited from ICM. In this paper we present a new dataset called JUMusEmoDB which presently has 1600 audio clips (approximately 30 s each) where 400 clips each correspond to happy, sad, calm and anxiety emotional scales. The initial annotations and emotional classification of the database was done based on an emotional rating test (5-point Likert scale) performed by 100 participants. The clips have been taken from different conventional ‘raga’ renditions played in two Indian stringed instruments – sitar and sarod by eminent maestros of ICM and digitized in 44.1 kHz frequency. The ragas, which are unique to ICM, are described as musical structures capable of inducing different moods or emotions. For supervised classification purposes, we have used Convolutional Neural Network (CNN) based architectures (resnet50, mobilenet v2.0, squeezenet v1.0 and a proposed ODE-Net) on corresponding music spectrograms of the 6400 sub-clips (where every clip was segmented into 4 sub-clips) which contain both time as well as frequency domain information. Along with emotion classification, instrument classification based response was also attempted on the same dataset using the CNN based architectures. In this context, a nonlinear technique, Multifractal Detrended Fluctuation Analysis (MFDFA) was also applied on the musical clips to classify them on the basis of complexity values extracted from the method. The initial classification accuracy obtained from the applied methods are quite inspiring and have been corroborated with ANOVA results to determine the statistical significance. This type of CNN based classification algorithm using a rich corpus of Indian Classical Music is unique even in the global perspective and can be replicated in other modalities of music also. The link to this newly developed dataset has been provided in the dataset description section of the paper. This dataset is still under development and we plan to include more data containing other emotional as well as instrumental entities into consideration.

Full Text