Abstract

This article discusses our research on polyphonic music transcription using non-negative matrix factorisation (NMF). The application of NMF in polyphonic transcription offers an alternative approach in which observed frequency spectra from polyphonic audio could be seen as an aggregation of spectra from monophonic components. However, it is not easy to find accurate aggregations using a standard NMF procedure since there are many ways to satisfy the factoring of V ≈ WH. Three limitations associated with the application of standard NMF to factor frequency spectra are (i) the permutation of transcription output; (ii) the unknown factoring r; and (iii) the factoring W and H that have a tendency to be trapped in a sub-optimal solution. This work explores the uses of the heuristics that exploit the harmonic information of each pitch to tackle these limitations. In our implementation, this harmonic information is learned from the training data consisting of the pitches from a desired instrument, while the unknown effective r is approximated from the correlation between the input signal and the training data. This approach offers an effective exploitation of the domain knowledge. The empirical results show that the proposed approach could significantly improve the accuracy of the transcription output as compared to the standard NMF approach.

Highlights

  • Automatic music transcription concerns the translation of music sounds into written manuscripts in standard music notations

  • The matrix W of basis vectors is learned from each pitch from a desired instrument. This ensures that the basis vector (a.k.a. dictionary, Tone-model) represents the harmonic structure of each pitch at the expense of the basis vector matrix being applicable for that particular instrument only

  • 3 Exploring negative matrix factorisation (NMF) for polyphonic transcription We investigate the application of NMF to extract polyphonic notes from a given polyphonic audio

Read more

Summary

Introduction

Automatic music transcription concerns the translation of music sounds into written manuscripts in standard music notations. Each neural network was trained to recognise one piano note with the frequency spectral features from approximately 30,000 samples where one-third of them were positive examples Soft computing approaches such as connectionism, support vector machine, hidden Markov model [23,24,26], etc., usually require complete training data as the performance of the model highly depends on the decision boundary constructed using the information from the training examples. This ensures that the basis vector (a.k.a. dictionary, Tone-model) represents the harmonic structure of each pitch at the expense of the basis vector matrix being applicable for that particular instrument only (e.g., the Tone-model learned from a piano will not work well with, for example, a violin) Many applications such as a performance analysis module in a guitar tutoring system, could benefit from this. Each Xk coefficient is a complex number; its corresponding magnitude and phase represent the corresponding magnitude and phase of frequency at k fs N

Piano roll representation
Switching off inactive pitches
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call