HMM-Based Voice Separation of MIDI Performance

Andrew Mcleod,Mark Steedman

doi:10.1080/09298215.2015.1136650

Abstract

Voice separation is an important component of Music Information Retrieval (MIR). In this paper, we present an HMM which can be used to separate music performance data in the form of MIDI into monophonic voices. It works on two basic principles: that consecutive notes within a single voice will tend to occur on similar pitches, and that there are short (if any) temporal gaps between them. We also present an incremental algorithm which can perform inference on the model efficiently. We show that our approach achieves a significant improvement over existing approaches, when run on a corpus of 78 compositions by J.S. Bach, each of which has been separated into the gold standard voices suggested by the original score. We also show that it can be used to perform voice separation on live MIDI data without an appreciable loss in accuracy. The code for the model described here is available at https://github.com/apmcleod/voice-splitting.

Full Text