Piano Transcription in the Studio Using an Extensible Alternating Directions Framework

Sebastian Ewert,Mark Sandler

doi:10.1109/taslp.2016.2593801

Abstract

Given a musical audio recording, the goal of automatic music transcription is to determine a score-like representation of the piece underlying the recording. Despite significant interest within the research community, several studies have reported on a 'glass ceiling' effect, an apparent limit on the transcription accuracy that current methods seem incapable of overcoming. In this paper, we explore how much this effect can be mitigated by focusing on a specific instrument class and making use of additional information on the recording conditions available in studio or home recording scenarios. In particular, exploiting the availability of single note recordings for the instrument in use we develop a novel signal model employing variable-length spectro-temporal patterns as its central building blocks - tailored for pitched percussive instruments such as the piano. Temporal dependencies between spectral templates are modeled, resembling characteristics of factorial scaled hidden Markov models (FS-HMM) and other methods combining Non-Negative Matrix Factorization with Markov processes. In contrast to FS-HMMs, our parameter estimation is developed in a global, relaxed form within the extensible alternating direction method of multipliers (ADMM) framework, which enables the systematic combination of basic regularizers propagating sparsity and local stationarity in note activity with more complex regularizers imposing temporal semantics. The proposed method achieves an f-measure of 93-95% for note onsets on pieces recorded on a Yamaha Disklavier (MAPS DB).

Highlights

A UTOMATIC Music Transcription (AMT) has a long history in music processing [1]
We develop a regularizer which encourages activations in A to follow transition rules for the templates described by the graphical model depicted in Fig. 2, i.e., we integrate the concepts behind the factorial scaled hidden Markov models (FS-hidden Markov model (HMM)) and its variants [6]
We presented a method for transcribing pitched-percussive instruments such as the piano in controlled recording conditions

Summary

INTRODUCTION

A UTOMATIC Music Transcription (AMT) has a long history in music processing [1]. Identifying higher-level musical concepts such as notes in digital music recordings, it is often considered a key technology for a semantic analysis of music, with applications ranging from various retrieval tasks in music informatics over computational musicology and performance analysis to creative music technology [2]. To obtain a practical transcription system, we assume that the user can play at the beginning of a recording session a note in pianissimo (low intensity), which is used by our system to derive a threshold employed to differentiate between an active note and estimation noise Given this scenario, we can tailor our proposed signal model to precisely this instrument class, which is necessary to account for the highly non-stationary behavior of the piano sound production process. This enables us, instead of strictly enforcing Markov properties as in FS-HMMs, to approximate the temporal transitions between spectral templates in a relaxed form by stating the parameter estimation problem as a structured sparse coding problem, which is controlled by simple convex regularizers Using these regularizers we can steer the solution close to a semantically meaningful progression similar to an FSHMM solution.

RELATED WORK

PROPOSED MODEL

Encouraging Data Fidelity and Non-Negativity

Encouraging Sparsity

Encouraging a Temporally Meaningful Template Order

Constraining the Concurrency of Templates

Differentiating Between Estimation Noise and Note Events

Encouraging Meaningful Long-Term Note Activity

PARAMETER ESTIMATION USING THE ALTERNATING DIRECTION METHOD OF MULTIPLIERS

Consensus Form ADMM

Linearized ADMM

MINIMIZING THE INDIVIDUAL TERMS

KL Data Fidelity Term

Non-Negativity Term

LASSO and Total Diagonal Variation Terms

Markov-State Regularizer

8: Repeat Until Convergence: 9

Thresholding Set Term

Binary Markov-State and Strict Coupling Regularizer

EXPERIMENTS

Method

Influence of Individual Regularizers and Error Analysis

Findings

CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Nov 1, 2016
Citations: 22	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Piano Transcription in the Studio Using an Extensible Alternating Directions Framework

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

Solution of Large-scale Structured Optimization Problems with Schur-complement and Augmented Lagrangian Decomposition Methods

-

02 Aug 2019
02 Aug 2019

Efficient l q norm based sparse subspace clustering via smooth IRLS and ADMM
Shenfen Kuang ... Jun Yang
Multimedia Tools and Applications | VOL. 76
Shenfen Kuang, et. al.Shenfen Kuang ... Jun Yang
09 Nov 2016
Multimedia Tools and Applications | VOL. 76

An Alternating Direction Algorithm for Total Variation Reconstruction of Distributed Parameters
Nuno B Bras ... A C Serra
IEEE Transactions on Image Processing | VOL. 21
Nuno B Bras, et. al.Nuno B Bras ... A C Serra
14 Feb 2012
IEEE Transactions on Image Processing | VOL. 21

An Alternating Direction Method of Multipliers for MCP-penalized Regression with High-dimensional Data
Yue Yong Shi ... Yong Xiu Cao
Acta Mathematica Sinica, English Series | VOL. 34
Yue Yong Shi, et. al.Yue Yong Shi ... Yong Xiu Cao
25 Jan 2018
Acta Mathematica Sinica, English Series | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Piano Transcription in the Studio Using an Extensible Alternating Directions Framework

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing