Broadcast Swin Transformer for Music Genre Classification

Yu Duan

doi:10.54097/p1x81q26

Abstract

As a matter of fact, with the rapid development of computation ability as well as machine learning models, music genre classification (MGC) has been widely explored in recent years, which is crucial to the development of modern digital music media platforms. With this in mind, this paper proposes a novel architecture called Broadcast Swin Transformer (BST). To be specific, it adds the Broadcast Mechanism to Swin Transformer, which can effectively convey as well as utilize the low-level information of the spectrogram at multiple scales. According to the analysis, the model has been experimented on Mel-spectrograms extracted from the audio dataset GTZAN with a Top-1 accuracy of 99.0%. At the same time, its excellent performance has also been demonstrated and evaluated in ablation study and comparison with the state-of-the-art methods. In the meantime, the current limitations as well as further prospects are presented as well. Overall, these results shed light on guiding further exploration of music genre classification.

Full Text