Abstract

This paper describes automatic music transcription with chord estimation for music audio signals. We focus on the fact that concurrent structures of musical notes such as chords form the basis of harmony and are considered for music composition. Since chords and musical notes are deeply linked with each other, we propose joint pitch and chord estimation based on a Bayesian hierarchical model that consists of an acoustic model representing the generative process of a spectrogram and a language model representing the generative process of a piano roll. The acoustic model is formulated as a variant of non-negative matrix factorization that has binary variables indicating a piano roll. The language model is formulated as a hidden Markov model that has chord labels as the latent variables and emits a piano roll. The sequential dependency of a piano roll can be represented in the language model. Both models are integrated through a piano roll in a hierarchical Bayesian manner. All the latent variables and parameters are estimated using Gibbs sampling. The experimental results showed the great potential of the proposed method for unified music transcription and grammar induction.

Highlights

  • Automatic music transcription (AMT) refers to the estimation of pitches, onset times, and durations of musical notes from music signals and has been considered to be important for music information retrieval

  • 1) Evaluation of language modeling We evaluated the effectiveness of each component of the language model by testing different priors on piano roll S

  • 2) Evaluation of prior training We evaluated the effectiveness of prior training of the language model via leave-one-out cross validation in which one musical piece was used for evaluation and the others were used for training the language model

Read more

Summary

Introduction

Automatic music transcription (AMT) refers to the estimation of pitches, onset times, and durations of musical notes from music signals and has been considered to be important for music information retrieval. Since multiple pitches usually overlap in polyphonic music and each pitch consists of many overtone components, estimation of multiple pitches is still an open problem. Such multipitch estimation is often called AMT, quantization of onset times and durations of musical notes is required for completing AMT. A major approach to multipitch estimation and AMT is to use non-negative matrix factorization (NMF) [1,2,3,4] It approximates the magnitude spectrogram of an observed music signal as the product of a basis matrix (spectral template vectors, each of which corresponds to a pitch) and an activation matrix (gain vectors, each of which is associated with a spectral template).

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call