Sparse representations of polyphonic music

Mark D Plumbley,Samer A Abdallah,Thomas Blumensath,Michael E Davies

doi:10.1016/j.sigpro.2005.06.007

Abstract

We consider two approaches for sparse decomposition of polyphonic music: a time-domain approach based on a shift-invariant model, and a frequency-domain approach based on phase-invariant power spectra. When trained on an example of a MIDI-controlled acoustic piano recording, both methods produce dictionary vectors or sets of vectors which represent underlying notes, and produce component activations related to the original MIDI score. The time-domain method is more computationally expensive, but produces sample-accurate spike-like activations and can be used for a direct time-domain reconstruction. The spectral-domain method discards phase information, but is faster than the time-domain method and retains more higher-frequency harmonics. These results suggest that these two methods would provide a powerful yet complementary approach to automatic music transcription or object-based coding of musical audio.

Full Text