BassNet: A Variational Gated Autoencoder for Conditional Generation of Bass Guitar Tracks with Learned Interactive Control

Maarten Grachten,Emmanuel Deruty,Stefan Lattner

doi:10.3390/app10186627

Abstract

Deep learning has given AI-based methods for music creation a boost by over the past years. An important challenge in this field is to balance user control and autonomy in music generation systems. In this work, we present BassNet, a deep learning model for generating bass guitar tracks based on musical source material. An innovative aspect of our work is that the model is trained to learn a temporally stable two-dimensional latent space variable that offers interactive user control. We empirically show that the model can disentangle bass patterns that require sensitivity to harmony, instrument timbre, and rhythm. An ablation study reveals that this capability is because of the temporal stability constraint on latent space trajectories during training. We also demonstrate that models that are trained on pop/rock music learn a latent space that offers control over the diatonic characteristics of the output, among other things. Lastly, we present and discuss generated bass tracks for three different music fragments. The work that is presented here is a step toward the integration of AI-based technology in the workflow of musical content creators.

Highlights

The advent of machine learning has unlocked a wealth of possibilities for innovation in music creation, as in many other areas
We report and the discuss the results of training the BassNet architecture separately on each of the datasets (A1–4 and PR), performing various train runs for each dataset, using varying window sizes
Because the models trained on the PR dataset do not allow for this type of groundtruth-based analysis of the latent space, we instead look at the effect of the latent variable on the distribution of predicted bass pitch values relative to the pitch values in the mix (Section 6.4)

Summary

Introduction

The advent of machine learning has unlocked a wealth of possibilities for innovation in music creation, as in many other areas. A common approach for music creation using AI is to train a generative probabilistic model of the data [3,4,5] and create music by iterative sampling, possibly while using constraints [6,7]. Apart from imposing constraints, as mentioned above, a common way of exercising control over the output of models for music creation is to provide conditioning signals, based on which the output is generated. As the distribution of the latent space is not known in general, there exists no efficient way to sample from common autoencoders. The possible ways to impose a prior on the latent space consist of using auxiliary losses, like in Adversarial Autoencoders [12], or by sampling directly from parametric distributions, like in Variational Autoencoders [13] (VAEs).

Methods

Results

Conclusion