Development tools for deep learning models of acoustical signal processing

Scott H Hawley

doi:10.1121/10.0011154

Abstract

We present a survey of available frameworks for developing acoustical signal processing models based on deep neural networks. Given that this is a dynamic space with new frameworks, libraries, and even companies appearing on timescales measured in months, we provide an up-to-date assessment of the strength, popularity, and near-future directions of several tools and platforms available for research and product deployment for deep learning models of audio signal processing. Similarly, those new to these spaces may be unaware of software systems that will allow them to obtain and interrogate results more quickly and easily, while also integrating the nearly state-of-the-art optimization methods. Included tools, packages and platforms include PyTorch, Tensorflow, Keras, JAX, fastai, PyTorch Lightning, Julia, nbdev, HuggingFace, Weights and Biases, and Gradio. Examples will be drawn from the speaker's recent research publications in musical signal processing and computer vision applied to musical acoustics, as well as recent work by others. The goal of the talk is to provide acoustics researchers, educators, students with a set of helpful possibilities for pursuing and improving their understanding, research practices, and communications.

Full Text