Abstract

Given a musical audio recording, the goal of automatic music transcription is to determine a score-like representation of the piece underlying the recording. Despite significant interest within the research community, several studies have reported on a 'glass ceiling' effect, an apparent limit on the transcription accuracy that current methods seem incapable of overcoming. In this paper, we explore how much this effect can be mitigated by focusing on a specific instrument class and making use of additional information on the recording conditions available in studio or home recording scenarios. In particular, exploiting the availability of single note recordings for the instrument in use we develop a novel signal model employing variable-length spectro-temporal patterns as its central building blocks - tailored for pitched percussive instruments such as the piano. Temporal dependencies between spectral templates are modeled, resembling characteristics of factorial scaled hidden Markov models (FS-HMM) and other methods combining Non-Negative Matrix Factorization with Markov processes. In contrast to FS-HMMs, our parameter estimation is developed in a global, relaxed form within the extensible alternating direction method of multipliers (ADMM) framework, which enables the systematic combination of basic regularizers propagating sparsity and local stationarity in note activity with more complex regularizers imposing temporal semantics. The proposed method achieves an f-measure of 93-95% for note onsets on pieces recorded on a Yamaha Disklavier (MAPS DB).

Highlights

  • A UTOMATIC Music Transcription (AMT) has a long history in music processing [1]

  • We develop a regularizer which encourages activations in A to follow transition rules for the templates described by the graphical model depicted in Fig. 2, i.e., we integrate the concepts behind the factorial scaled hidden Markov models (FS-hidden Markov model (HMM)) and its variants [6]

  • We presented a method for transcribing pitched-percussive instruments such as the piano in controlled recording conditions

Read more

Summary

INTRODUCTION

A UTOMATIC Music Transcription (AMT) has a long history in music processing [1]. Identifying higher-level musical concepts such as notes in digital music recordings, it is often considered a key technology for a semantic analysis of music, with applications ranging from various retrieval tasks in music informatics over computational musicology and performance analysis to creative music technology [2]. To obtain a practical transcription system, we assume that the user can play at the beginning of a recording session a note in pianissimo (low intensity), which is used by our system to derive a threshold employed to differentiate between an active note and estimation noise Given this scenario, we can tailor our proposed signal model to precisely this instrument class, which is necessary to account for the highly non-stationary behavior of the piano sound production process. This enables us, instead of strictly enforcing Markov properties as in FS-HMMs, to approximate the temporal transitions between spectral templates in a relaxed form by stating the parameter estimation problem as a structured sparse coding problem, which is controlled by simple convex regularizers Using these regularizers we can steer the solution close to a semantically meaningful progression similar to an FSHMM solution.

RELATED WORK
PROPOSED MODEL
Encouraging Data Fidelity and Non-Negativity
Encouraging Sparsity
Encouraging a Temporally Meaningful Template Order
Constraining the Concurrency of Templates
Differentiating Between Estimation Noise and Note Events
Encouraging Meaningful Long-Term Note Activity
PARAMETER ESTIMATION USING THE ALTERNATING DIRECTION METHOD OF MULTIPLIERS
Consensus Form ADMM
Linearized ADMM
MINIMIZING THE INDIVIDUAL TERMS
KL Data Fidelity Term
Non-Negativity Term
LASSO and Total Diagonal Variation Terms
Markov-State Regularizer
8: Repeat Until Convergence: 9
Thresholding Set Term
Binary Markov-State and Strict Coupling Regularizer
EXPERIMENTS
Method
Influence of Individual Regularizers and Error Analysis
Findings
CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.