Abstract
The classification of music genres has been a key focus in the field of music information retrieval (MIR), many researchers are also looking for a number of ways to solve this problem. Early approach Feature-based methods such as Support Vector Machines (SVMs) and K-nearest Neighbors (k-NN) are mainly used for handcrafted features such as Mel frequency cepstrum coefficients (MFCC). As convolutional neural networks (CNNs) are increasingly used to capture complex time-frequency patterns in audio data, the performance and accuracy of music genre classification has been dramatically improved.Recent research advances have been made in integrating attention mechanisms and mixing mechanisms, which further improves the accuracy of the genre classification model. But that's it. Methods often rely on a single type of input data, such as spectrograms or raw audio data, and are therefore limited. They are able to fully capture the multifaceted nature of music.In this research, we propose a novel multi-input neural network architecture that uniquely integrates a CNN for Mel spectrogram processing with a Multilayer Perceptron (MLP) for handling additional feature data extracted from CSV files. This two-branch approach can effectively help us understand music data more comprehensively while processing spectral and feature-based information. The GTZAN dataset was a great help for our experiments, and the details of this part will be explained later. In addition, our model effectively combines the advantages of both CNN and MLP methods, thereby improving the classification accuracy of multiple music genres. And finally, our results show that the system achieves [0.77] accuracy, support [10], recall [0.67], and F1 score [0.65].
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have