Factors in factorization: Does better audio source separation imply better polyphonic music transcription?

Tiago Fernandes Tavares,George Tzanetakis,Peter Driessen

doi:10.1109/mmsp.2013.6659326

Abstract

Spectrogram factorization methods such as Non-Negative Matrix Factorization (NMF) are frequently used as a way to separate individual sound sources from complex sound mixtures. More recently, they have also been used as a first stage for the automatic transcription of polyphonic music. The problem of sound source separation is different (but related) to the problem of automatic music transcription. The output of the first is the separated audio signals corresponding to each sound source, whereas the output of the second is a symbolic representation/music score that encodes the discrete pitches/notes that are played and when they are played. Many variations of factorization methods have been proposed. Two important design choices are the way spectra are represented and what distance measures are used to compare them in the optimization used for factorization. A common assumption has been that a variant that yields better signal separation will result in better automatic transcription. In this work, we investigate experimentally this question and show that this relationship is not necessarily true.

Full Text