Abstract

In many types of music, percussion plays an essential role to establish the rhythm and the groove of the music. Algorithms that can decompose the percussive signal into its constituent components would therefore be very useful, as they would enable many analytical and creative applications. This paper describes a method for the unsupervised decomposition of percussive recordings, building on the non-negative matrix factor deconvolution (NMFD) algorithm. Given a percussive music recording, NMFD discovers a dictionary of time-varying spectral templates and corresponding activation functions, representing its constituent sounds and their positions in the mix. We observe, however, that the activation functions discovered using NMFD do not show the expected impulse-like behavior for percussive instruments. We therefore enforce this behavior by specifying that the activations should take on binary values: either an instrument is hit, or it is not. To this end, we rewrite the activations as the output of a sigmoidal function, multiplied with a per-component amplitude factor. We furthermore define a regularization term that biases the decomposition to solutions with saturated activations, leading to the desired binary behavior. We evaluate several optimization strategies and techniques that are designed to avoid poor local minima. We show that incentivizing the activations to be binary indeed leads to the desired impulse-like behavior, and that the resulting components are better separated, leading to more interpretable decompositions.

Highlights

  • IntroductionWu et al [3] give a comprehensive overview of the state-of-the-art in automatic drum transcription (ADT), and perform an in-depth comparison of these methods

  • We show that the proposed algorithm achieves more impulse-like activations compared to unconstrained negative matrix factor deconvolution (NMFD) and sparse NMFD, making it better suited to the properties of percussive mixtures, while yielding a good decomposition and spectrogram reconstruction quality

  • We investigated an adapted NMFD model where the activations are biased to be binary in nature, by defining them as the output of a sigmoidal function and by applying a regularization term to push their values to saturation

Read more

Summary

Introduction

Wu et al [3] give a comprehensive overview of the state-of-the-art in ADT, and perform an in-depth comparison of these methods They identify two classes of “activation-based” methods that currently dominate the state-of-the-art, namely, on the one hand neural network based systems using Recurrent Neural Network [4,5] or Convolutional Neural Network [6] architectures, and on the other hand methods based on non-negative matrix factorization (NMF) [3,7]. According to their analysis, neural networkbased approaches outperform NMF-based methods in terms of transcription accuracy when a large and diverse training dataset with high-quality annotations is available. Unsupervised transcription systems, such as the ones based on NMF, can be used to improve supervised approaches by leveraging them in semi-supervised learning schemes such as student–teacher learning [8]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call