Double Coated VGG16 Architecture: An Enhanced Approach for Genre Classification of Spectrographic Representation of Musical Pieces

Partha Protim Das,Aundrila Acharjee,Marium-E-Jannat Marium-E-Jannat

doi:10.1109/iccit48885.2019.9038339

Abstract

In the field of Music Information Retrieval(MIR), one of the most demanding task would be music genre classification. The search for automated categorization of music on the basis of genre yielded some diverse techniques, prominent ones of them rely on machine learning, and recently deep learning. In this paper we approached this very problem with Convolutional Neural Network(CNN), which has already been proven very useful in categorizing images in the past few years. In this work, some of the popular CNN architectures including ResNet, VGG etc., are experimented with, of which a version of the latter-the VGG16 has outperformed the rest by a handsome degree. On top of the vanilla VGG16 architecture, we coupled that with two layers of additional dense layers tuned for this specific task. We used audio clips of each genre, in this case ten genres, and got the spectrogram images of them and fed them into the network. In this approach we were able to achieve 84% accuracy, thus providing a promising stance in the genre classification problem.

Full Text