Multiple f0 pitch estimation for musical applications using dynamic Bayesian networks and learned priors

David A Dahlbom,Jonas Braasch

doi:10.1121/1.5101633

Abstract

The identification of multiple simultaneous pitches is a challenging signal processing task and cannot at present be performed as well as trained human subjects. Moreover, it appears that successful human performance depends on skill acquisition and knowledge of musical conventions. Even human capabilities are likely fairly poor in the absence of training and musical context. We present a framework, using Dynamic Bayesian Networks, that permits the principled incorporation of models of music theory, musical instruments, and human pitch perception. A particular advantage of this approach is the ability to develop each of these models independently, relying either on expert knowledge or machine learning as necessary. Models of appropriate complexity can then be selected for a specific application. In the present work, we focus on the use of learned models of musical context, specifically Deep Markov Models, and use these to improve inferences about simultaneous pitches. The main drawback of this framework is the intractability of the inference computations and the computational expense of approximation methods. We explore particle filtering as an approach to addressing these problems with the ultimate aim of making a system useable in a musical performance system. [Work supported by NSF BCS-1539276 and IBM AIRC grant.]

Full Text