Abstract

Spectrogram factorization methods such as Non-Negative Matrix Factorization (NMF) are frequently used as a way to separate individual sound sources from complex sound mixtures. More recently, they have also been used as a first stage for the automatic transcription of polyphonic music. The problem of sound source separation is different (but related) to the problem of automatic music transcription. The output of the first is the separated audio signals corresponding to each sound source, whereas the output of the second is a symbolic representation/music score that encodes the discrete pitches/notes that are played and when they are played. Many variations of factorization methods have been proposed. Two important design choices are the way spectra are represented and what distance measures are used to compare them in the optimization used for factorization. A common assumption has been that a variant that yields better signal separation will result in better automatic transcription. In this work, we investigate experimentally this question and show that this relationship is not necessarily true.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.