More Than Simply Masking: Exploring Pre-training Strategies for Symbolic Music Understanding

Zhexu Shen,Hongfei Lin,Zhihan Yang,Liang Yang

doi:10.1145/3591106.3592237

Abstract

Pre-trained language models have become the prevailing approach for handling natural language processing tasks in recent years. Given the similarities in sequential features between symbolic music and natural language text, it is fairly logical to adopt pre-training methods to symbolic music data. However, the disparity between music and natural language text makes it difficult to comprehensively model the unique features of music through traditional text-based pre-training strategies alone. To address this challenge, in this paper, we design the quad-attribute masking (QM) strategy and propose the key prediction (KP) task to improve the extraction of generic knowledge from symbolic music. We evaluate the impact of various pre-training strategies on several public symbolic music datasets, and the results of our experiments reveal that the proposed multi-task pre-training model can effectively capture music domain knowledge from symbolic music data and significantly improve performance on downstream tasks.

Full Text