Abstract

As the foundation of many applications, multipitch estimation problem has always been the focus of acoustic music processing; however, existing algorithms perform deficiently due to its complexity. In this paper, we employ deep learning to address piano multipitch estimation problem by proposing MPENet based on a novel multimodal sparse incoherent non-negative matrix factorization (NMF) layer. This layer originates from a multimodal NMF problem with Lorentzian-BlockFrobenius sparsity constraint and incoherentness regularization. Experiments show that MPENet achieves state-of-the-art performance (83.65% F-measure for polyphony level 6) on RAND subset of MAPS dataset. MPENet enables NMF to do online learning and accomplishes multi-label classification by using only monophonic samples as training data. In addition, our layer algorithms can be easily modified and redeveloped for a wide variety of problems.

Highlights

  • Multipitch estimation problem (MPE, cf. [1,2,3,4] and references therein) is the concurrent identification of multiple notes in an acoustic polyphonic music clip

  • {C4, D4}, {E0, G2, A5}, {F3, A3, C4, E4, G4, B4, D5}1, or other combinations. It is a prerequisite for Automatic Music Transcription (AMT, [5]), Musical Information Retrieval (MIR, [6]), and many other acoustic music processing applications

  • Based on and inspired by the above discussion, we in this paper propose Multipitch estimation network (MPENet), which is a deep learning (DL, [17,18,19,20,21]) network enhanced by a novel multimodal sparse incoherent negative matrix factorization (NMF) layer (MSI-NMF layer)

Read more

Summary

Introduction

Multipitch estimation problem (MPE, cf. [1,2,3,4] and references therein) is the concurrent identification of multiple notes in an acoustic polyphonic music clip. Multipitch estimation network (MPENet) : Given the decomposition capability of proposed layer, we employ “one vs all” strategy and present a unified deep learning network consisting of a training subnet and a test subnet. Other methods employing similar idea but different implementations are proposed in [2, 4, 26,27,28,29] Such procedure uses fixed dictionary to get note activations during test, so MPE results heavily depend on the learned note templates, i.e., training samples. Note that the deep learning methods listed here all use music pieces as training data, which means polyphonic information can be accessed, music language model and classifiers are learned simultaneously

Notation
Backward pass
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.