Abstract

Artificial Intelligence (AI) and Machine Learning can be cited as one of the greatest technological advancements in this century. They are revolutionizing the fields of computing, finance, healthcare, agriculture, music, space and tourism. Powerful models have achieved excellent performance on a myriad of complex learning tasks. One such subset of AI is audio analysis. It entails music information retrieval, music generation and music classification. Music data is one the most abstruse type of source data present, mainly because it is a tough work to extract meaningful correlating features from it. Hence a myriad of algorithms ranging from classical to hybrid neural networks have been tried on music data for a getting a good accuracy. This paper studies the various methods that can be used for music genre classification and compares between them. The accuracies we obtained on a small sample of the Free Music Archive (FMA) dataset were: 46% using Support Vector Classifier (SVC), 40% using Logistic Regression, 67% using Artificial Neural Network (ANN), 77% using Convolutional Neural Networks (CNN), 90% using Convolution-Recurrent Neural Network (CRNN), 88% using Parallel Convolution-Recurrent Neural Network (PCRNN), 73% without using Ensemble technique and 85% using Ensemble technique of Ada Boost. We defined SVC as our baseline model which had 46% accuracy, and we defined the succeeding models to achieve accuracy greater than that. ANN gave us 67% on the test dataset, which was surpassed by CNN at 77%. We noticed image based features worked better at classifying the labels than normal extracted features from the audio. A combination of CNN and RNN worked the best for the dataset, with a series CRNN model giving the best accuracy. Succeeding that, we tried to fit an ensemble model onto our dataset and analyzed its workings. This paper presents a comprehensive study of the various methods that can be used for music genre classification, with a focus on some parallel models and ensembling techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call