Abstract

The automatic music transcription (AMT) task is designed to convert raw performance audio signals into digital representations of symbolic music for possible computational musicology. In polyphonic music, there are multiple notes and they may appear at the same time. The combined output of multiple notes faces the problem of dimension explosion, which makes it difficult to achieve accurate transcription. To overcome the above challenge, a deep learning model based on graph convolution network (CR-GCN) is proposed to cope with the problem of dimension explosion by exploring the interdependence between musical notes. The model is divided into two parts, i.e., feature learning and label learning. Feature learning is composed of serial convolutional neural network (CNN) and recurrent neural network (RNN), which aims to extract temporal and spatial features from the input music signals. Label learning consists of graph convolutional network (GCN), which is adopted to model the interdependence between notes. The CR-GCN model can be end-to-end trainable through joint training feature networks and label networks. Experiments on public polyphonic music data sets show that the proposed method is able to mine more co-existing notes, and it is superior to existing methods in both frame-level and note-level indexes. Moreover, visual analysis shows that the learned interdependence between notes has good explainability in music theory.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call