Multiresolutional Hierarchical Bayesian NMF for Detailed Audio Analysis of Music Performances

Takeshi Hori,Kazuyuki Nakamura,Shigeki Sagayama

doi:10.23919/apsipa.2018.8659656

Abstract

In this paper, we discuss a method for a music performance detail analysis using multiresolution analysis allowing simultaneous estimation of pitch, precise onset, duration and intensity from polyphonic audio. The motivation is to obtain information that is detailed enough to develop a performance model of a human player. Characteristics of human performance can be observed as local and global tempo changes, sound intensity (volume or velocity in a MIDI), and articulations like slur and staccato. Estimation and extraction of such features from a musical audio signal in detail is useful for music information retrieval systems, automatic transcription systems, as well as automatic performance systems to train the relationship between music features and player performance. Our proposed system is based on non-negative matrix factorization (NMF) using hierarchical Bayesian inference, which is modeling harmonic and nonharmonic structures, note durations, intensities, and onset information stochastically. The estimation process comprises two steps. In the first step, variational Bayesian inference and a Gaussian mixture model is used to roughly estimate pitch onset, intensity and duration. These values are used as a prior for the second more detailed step, in which time resolution is doubled and the estimation is repeated to refine the results. The evaluation results show that the our proposed multiresolution Bayesian model can estimate more precise onset times and durations than our non-multiresolution Bayesian model.

Full Text