Automatic Music Transcription for the Thai Xylophone played with Soft Mallets

Apichai Huaysrijan,Sunee Pongpinigpinyo

doi:10.1109/jcsse54890.2022.9836266

Abstract

Automatic music transcription (AMT) is the conversion of audio to music notation, which helps with music education, music production, and music creation. The Thai xylophone is a Thai classical music instrument. Commonly, Thai xylophone has two types of mallets, including soft mallets and hard mallets. This paper proposes the study of AMT for Thai xylophone played with soft mallets. We compared feature extraction using Mel-Spectrogram and Mel-Frequency Cepstral Coefficient (MFCC), as well as deep learning using the Onsets and Frames model (OaF), which is the state of the art for AMT. We collected 30 Thai xylophone played with soft mallets songs with music notation as the dataset. The results show that Mel-Spectrogram outperforms MFCC. The experiment shows that Mel-Spectrogram with the OaF model performed the best on the frame detector with 87.04% of F1-Score and the onset detector with 94.35% of F1-Score. We also conduct ablation research.

Full Text