Abstract
Given a musical audio recording, the goal of automatic music transcription is to determine a score-like representation of the piece underlying the recording. Despite significant interest within the research community, several studies have reported on a 'glass ceiling' effect, an apparent limit on the transcription accuracy that current methods seem incapable of overcoming. In this paper, we explore how much this effect can be mitigated by focusing on a specific instrument class and making use of additional information on the recording conditions available in studio or home recording scenarios. In particular, exploiting the availability of single note recordings for the instrument in use we develop a novel signal model employing variable-length spectro-temporal patterns as its central building blocks - tailored for pitched percussive instruments such as the piano. Temporal dependencies between spectral templates are modeled, resembling characteristics of factorial scaled hidden Markov models (FS-HMM) and other methods combining Non-Negative Matrix Factorization with Markov processes. In contrast to FS-HMMs, our parameter estimation is developed in a global, relaxed form within the extensible alternating direction method of multipliers (ADMM) framework, which enables the systematic combination of basic regularizers propagating sparsity and local stationarity in note activity with more complex regularizers imposing temporal semantics. The proposed method achieves an f-measure of 93-95% for note onsets on pieces recorded on a Yamaha Disklavier (MAPS DB).
Highlights
A UTOMATIC Music Transcription (AMT) has a long history in music processing [1]
We develop a regularizer which encourages activations in A to follow transition rules for the templates described by the graphical model depicted in Fig. 2, i.e., we integrate the concepts behind the factorial scaled hidden Markov models (FS-hidden Markov model (HMM)) and its variants [6]
We presented a method for transcribing pitched-percussive instruments such as the piano in controlled recording conditions
Summary
A UTOMATIC Music Transcription (AMT) has a long history in music processing [1]. Identifying higher-level musical concepts such as notes in digital music recordings, it is often considered a key technology for a semantic analysis of music, with applications ranging from various retrieval tasks in music informatics over computational musicology and performance analysis to creative music technology [2]. To obtain a practical transcription system, we assume that the user can play at the beginning of a recording session a note in pianissimo (low intensity), which is used by our system to derive a threshold employed to differentiate between an active note and estimation noise Given this scenario, we can tailor our proposed signal model to precisely this instrument class, which is necessary to account for the highly non-stationary behavior of the piano sound production process. This enables us, instead of strictly enforcing Markov properties as in FS-HMMs, to approximate the temporal transitions between spectral templates in a relaxed form by stating the parameter estimation problem as a structured sparse coding problem, which is controlled by simple convex regularizers Using these regularizers we can steer the solution close to a semantically meaningful progression similar to an FSHMM solution.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.