Abstract

Nowadays, music genre classification is becoming an interesting area and attracting lots of research attention. Multi-feature model is acknowledged as a desirable technology to realize the classification. However, the major branches of multi-feature models used in most existed works are relatively independent and not interactive, which will result in insufficient learning features for music genre classification. In view of this, we exploit the impact of learning feature interaction among different branches and layers on the final classification results in a multi-feature model. Then, a middle-level learning feature interaction method based on deep learning is proposed correspondingly. Our experimental results show that the designed method can significantly improve the accuracy of music genre classification. The best classification accuracy on the GTZAN dataset can reach 93.65%, which is superior to most current methods.

Highlights

  • With the rise of music streaming media services, tens of thousands of digital songs have been uploaded to the Internet

  • Hafemann et al [21] confirmed that a Convolution Neural Network (CNN) can mine rich texture information from the spectrum and improve the performance of music classification, because CNN is very sensitive to the texture information of the image

  • This paper proposes a middle-level learning feature interaction method using deep learning

Read more

Summary

Introduction

With the rise of music streaming media services, tens of thousands of digital songs have been uploaded to the Internet. Hafemann et al [21] confirmed that a Convolution Neural Network (CNN) can mine rich texture information from the spectrum and improve the performance of music classification, because CNN is very sensitive to the texture information of the image For this reason, music genre classification based on visual features has achieved remarkable results in recent years. Most of the existing methods are mainly modified based on the CNN model of image recognition These methods take audio spectrogram, Mel-frequency spectrogram and other visual features as the input. Our research verifies that in the multi-feature model using MLFI, the learning features close to the input and output have a better impact on improving the classification accuracy, which is an important and core contribution of this research. The model extracts significant differences from the Mel-frequency spectrogram of each genre and conducts audio classification tasks based on autonomous learning. This set of audio features (as shown in Table 1) provides the best results for our experiments

Timbral Texture Features
Other Features
Proposed Design and Approach
Network Structure
Middle-Level Learning Feature Interaction Method
A A or B
Preprocessing
Training and Other Details
Classification Results on GTZAN
Classification Results of Two-Way Interaction Mode
Comparison of Each Mode
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.