Abstract
Incorporating polyphony information has proven beneficial for enhancing multi-pitch estimation (MPE) algorithms. Various approaches have been adopted to leverage polyphony, either by integrating it within deep learning frameworks or applying it as a post-processing step. These methods utilize the overall count of active pitches in an audio frame to refine the accuracy of the estimated active pitches. Our study introduces a novel approach by focusing on polyphony per instrument (PPI), which expands the conventional scope of MPE towards MPE per instrument. We adapt an existing U-Net model originally designed for standard MPE to facilitate a multi-channel MPE approach that incorporates and estimates PPI. The adjustments bring an instrument recognition component to the model. Our modifications ensure that the enhanced model maintains comparability with the original baseline. Our findings reveal that PPI improves MPE performance, and the modifications enable the model to do a timbre-separate transcription task. To validate our results, we conduct cross-dataset evaluations using a novel dataset in the genre of popular electronic music, which includes a broad spectrum of non-traditional timbres. These evaluations underscore stabilization effects of polyphony information in general and PPI in particular, across diverse musical styles.
Published Version (
Free)
Join us for a 30 min session where you can share your feedback and ask us any queries you have