Automatic Music Genre Classification based on CNN and LSTM

Xining Luo

doi:10.54097/hset.v39i.6494

Abstract

Various applications of machine learning are discovered and receiving more and more attention contemporarily. The music industry has benefited from the incorporation of artificial intelligence, especially the field of music classification, as machines are able to organize big data in a more efficient manner than the traditional human expertises. This paper compares two machine learning models, the Convolutional Neural Network model (CNN), and the Long Short Term Memory model (LSTM), from their architectures, functionality, to classification accuracy based on empirical data. The models were trained on two datasets, GTZAN and FMA. The result indicates that the CNN model achieved a 56.0% and 50.5% accuracy for the two datasets respectively, outperforming the LSTM model, which had a 42.0% and 33.5% accuracy. The paper aims to analyze the two models’ capability for music classification and determine which model is better suited for the task. These results shed light on guiding further exploration of computer music.

Full Text