Audio (vector) algebra: Vector space operations on neural audio embeddings

Scott H Hawley,Joe Baldridge,Zach Evans

doi:10.1121/10.0015957

Abstract

Ever since the work of Castellon, Donahue, and Liang (ISMIR 2021) showed that latent space “embedding” representations encoded by OpenAI's Jukebox model contain semantically meaningful information about the music, many have wondered whether such embeddings support vector relations akin to the famous “king—man + woman = queen” result seen in word vector embeddings. Such an “audio (vector) algebra” would provide a way to perform operations on the audio by displacing the embeddings in certain directions, and then decoding them to new sounds. The nonlinear aspects of the encoding process suggest that this may not be possible in general, however, for certain kinds of operations in finite regions of embedding spaces, such embedding vector transformations may indeed have musically relevant counterparts. In this talk we investigate the feasibility of such schemes for the cases of mixing and audio effects.

Full Text