Abstract

Ever since the work of Castellon, Donahue, and Liang (ISMIR 2021) showed that latent space “embedding” representations encoded by OpenAI's Jukebox model contain semantically meaningful information about the music, many have wondered whether such embeddings support vector relations akin to the famous “king—man + woman = queen” result seen in word vector embeddings. Such an “audio (vector) algebra” would provide a way to perform operations on the audio by displacing the embeddings in certain directions, and then decoding them to new sounds. The nonlinear aspects of the encoding process suggest that this may not be possible in general, however, for certain kinds of operations in finite regions of embedding spaces, such embedding vector transformations may indeed have musically relevant counterparts. In this talk we investigate the feasibility of such schemes for the cases of mixing and audio effects.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.