With the deepening research on music works, music transcription algorithms have been increasingly studied. This study examined the recognition of piano-playing notes using a music-transcription algorithm. First, the characteristics of MelSpec, LogSpec, and the constant Q-transform (CQT) are briefly introduced. Then, a convolutional recurrent neural network (CRNN) transcription algorithm, which includes four convolutional blocks and one bidirectional long short-term memory (BiLSTM) structure, was designed. The recognition performance of this method was analyzed using the MAPS dataset. LogSpec was found to have the best recognition performance for piano-playing notes when used as an input feature. In the CRNN structure, the recognition performance for piano-playing notes was the best when four convolutional blocks were used. Compared with the convolutional neural network (CNN), BiLSTM, and CNN-hidden Markov model algorithms, the F1-values of the CRNN algorithm were 84.9%, 92.24%, and 79.27% for frames, notes, and offsets, respectively, achieving the best recognition results. The results verify that the CRNN transcription algorithm is effective for the recognition of piano-playing notes and can be applied in practice.
Read full abstract