A Music Classification model based on metric learning applied to MP3 audio files

Angelo Cesar Mendes Da Silva,Maurício Archanjo Nunes Coelho,Raul Fonseca Neto

doi:10.1016/j.eswa.2019.113071

Abstract

The development of models for learning music similarity from audio media files is an increasingly important task for the entertainment industry. This work proposes a novel music classification model based on metric learning whose main objective is to learn a personalized metric for each customer. The metric learning process considers the learning of a set of parameterized distances employing a structured prediction approach from a set of MP3 audio files containing several music genres according to the users taste. The structured prediction solution aims to maximize the separation margin between genre centroids and to minimize the overall intra-cluster distances. To extract the acoustic information we use the Mel-Frequency Cepstral Coecient (MFCC) and made a dimensionality reduction using Principal Components Analysis (PCA). We attest the model validity performing a set of experiments and comparing the training and testing results with baseline algorithms, such as K-means and Soft Margin Linear Support Vector Machine (SVM). Also, to prove the prediction capacity, we compare our results with two recent works with good prediction results on the GTZAN dataset. Experiments show promising results and encourage the future development of an online version of the learning model to be applied in streaming platforms.

Full Text