Abstract
Music classification using deep neural networks has gained a lot of attention in recent years. This is due to the difficult task of capturing every essential aspect of music in features and interpretability of classifiers. There is limited research on the integration of VGG16 and RNNs, but the researchers found that few classifiers accurately capture intrinsic musical characteristics. Previous work in this field has primarily focused on spectral features, which has constrained overall performance. To address this issue, we proposed a novel hybrid neural architecture based on Visual Geometry Group 16 (VGG16), which is highly effective in extracting important features from musical variations. We combined VGG16 with several recurrent neural network (RNN) variants, including Gated Recurrent Unit (GRU), Bidirectional GRU (BiGRU), Long Short-Term Memory (LSTM), and Bidirectional LSTM (BiLSTM). Additionally, we compared their performance for the GTZAN dataset using both Mel-Spectrogram and Mel-Frequency Cepstral Coefficients (MFCC) features. Our results indicate that the VGG16+GRU model achieved the highest accuracy of 89. 60% with Mel spectrograms and 82. 70% with MFCC features. These findings demonstrate the effectiveness of combining advanced feature extraction techniques with deep learning models for music genre classification.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have