Musical audio samples generated from joint text embeddings

Zach Evans,Katherine Crowson,Scott H Hawley

doi:10.1121/10.0015956

Musical audio samples generated from joint text embeddings

Zach Evans, Katherine Crowson + Show 1 more

https://doi.org/10.1121/10.0015956

Copy DOI

Journal: The Journal of The Acoustical Society of America

Publication Date: Oct 1, 2022

Affiliation: Belmont University

#Musical Sounds #Text Descriptions + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The field of machine learning has benefited from the appearance of diffusion-based generative models for images and audio. While text-to-image models have become increasingly prevalent, text-to-audio generative models are currently an active area of research. We present work on generating short samples of musical instrument sounds generated by a model which was conditioned on text descriptions and the file structure labels of large sample libraries. Preliminary findings indicate that generation of wide-spectrum sounds such as percussion are not difficult, while the generation of harmonic musical sounds presents challenges for audio diffusion models.

Full Text