Data augmentation techniques for classification of music genre

Aixin Zhang,Hanyu Zhang

doi:10.54254/2755-2721/33/20230257

Abstract

Music genre classification, a crucial aspect in the field of audio processing, has been widely investigated using different machine learning methods. However, there is still a need to determine the most optimal application between Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs) in the field of music genre classification, and the augmentation and enhancement of the data utilized for these machine learning techniques is still an important topic to study. The objective of this research is to identify the most effective solution and to improve the accuracy of music categorization. This will be achieved through an in-depth comparison of various data augmentation techniques and neural networks. In this study, both CNN and LSTM networks are optimized and trained. The dataset has been enhanced by extracting Mel Frequency Cepstral Coefficients (MFCC) data from music recordings using three different sampling techniques. The thorough approach undertaken facilitates a detailed assessment of the effectiveness of each method in classifying genres. The experimental findings show that the combination of CNN and random sampling outperforms all other tested algorithms, resulting in a significant enhancement in genre classification accuracy. This research provides valuable findings that can inform future studies in the pursuit of more effective techniques for classifying music genres.

Full Text