Abstract

With unlabeled music data widely available, it is necessary to build an unsupervised latent music representation extractor to improve the performance of classification models. This paper proposes an unsupervised latent music representation learning method based on a deep 3D convolutional denoising autoencoder (3D-DCDAE) for music genre classification, which aims to learn common representations from a large amount of unlabeled data to improve the performance of music genre classification. Specifically, unlabeled MIDI files are applied to 3D-DCDAE to extract latent representations by denoising and reconstructing input data. Next, a decoder is utilized to assist the 3D-DCDAE in training. After 3D-DCDAE training, the decoder is replaced by a multilayer perceptron (MLP) classifier for music genre classification. Through the unsupervised latent representations learning method, unlabeled data can be applied to classification tasks so that the problem of limiting classification performance due to insufficient labeled data can be solved. In addition, the unsupervised 3D-DCDAE can consider the musicological structure to expand the understanding of the music field and improve performance in music genre classification. In the experiments, which utilized the Lakh MIDI dataset, a large amount of unlabeled data was utilized to train the 3D-DCDAE, obtaining a denoising and reconstruction accuracy of approximately 98%. A small amount of labeled data was utilized for training a classification model consisting of the trained 3D-DCDAE and the MLP classifier, which achieved a classification accuracy of approximately 88%. The experimental results show that the model achieves state-of-the-art performance and significantly outperforms other methods for music genre classification with only a small amount of labeled data.

Highlights

  • In recent years, a series of methods represented by hierarchical and deep layers have been proposed, which give hope for training deep models

  • A series of methods represented by hierarchical and deep layers have been proposed, which give hope for training deep models. These methods have been successful in several application areas, such a music information retrieval [1], computer vision (CV) [2], and natural language processing (NLP) [3]

  • In this paper, an effective autoencoder as a latent representation extractor and an multilayer perceptron (MLP) classifier is designed for music genre classification

Read more

Summary

Introduction

A series of methods represented by hierarchical and deep layers have been proposed, which give hope for training deep models. A limited amount of labeled data is available for training classification models, common classifiers that utilize latent representations as input can perform well in a given domain. An unsupervised music latent representation learning method based on a deep 3D convolutional denoising autoencoder (3D-DCDAE) for music genre classification is proposed. The experimental results show that the 3D-DCDAE extracts robust latent representations of music from a large amount of data and considers the musicology structure of music, resulting in the proposed model having strong generalization ability. Unlike most existing research, the proposed method utilizes music data in MIDI format as input, which allows the model to consider a variety of music features.

Related Works
Music Genre Classification
Comparison of Music Genre Classification Based on Deep Learning
Music Genre Classification System Based on 3D-DCDAE
Overview
MLP Classifier
Experimental Data
Experimental Results
Results
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call