Mixture of CNN Experts from Multiple Acoustic Feature Domain for Music Genre Classification

Yang Yi,Kuan-Yu Chen,Hung-Yan Gu

doi:10.1109/apsipaasc47483.2019.9023314

Abstract

In the field of music information retrieval (MIR), audio spectrogram can carry a great deal of information about the music content so as to be a robust visual representation for music signal. Recently, many research literatures show that convolutional neural network (CNN) has ability to capture indicative acoustic patterns from spectrogram input, and make remarkable performance on MIR-related tasks such as music genre classification (MGC). In this paper, we continue the line of research to explore different types of spectrograms, to emphasize different characteristics of music genre for the MGC task. To jointly leverage all of these features, in this paper, a mixture of experts (MoE) system is proposed. More formally, a set of MGC models can be derived by using the various spectrogram-based statistics. Then we treat each model as an individual expert. Accordingly, a neural mixture model is introduced to collect and compile the predictions from the expert models, and then to output a final decision for a given music to be predicted. In a nutshell, our major contributions in this paper are at least twofold. On one hand, we comprehensively examine several spectrogram-based features for the MGC task. On the other hand, a neural-based MoE system, which can dynamically decide the weighting factor for each expert system, is proposed to enhance the performance of the MGC task11The source code is available at https://github.com/superlyy/apsipa_2019.. Experimental results demonstrate that the proposed framework not only can achieve success results than individual expert models, but has ability to provide a comparable classification accuracy to the SOTA systems.

Full Text