Audio Signal Mapping into Spectrogram-Based Images for Deep Learning Applications

Dejan Ciric,Nikola Vucic,Jelena Nikolic,Zoran Peric

doi:10.1109/infoteh51037.2021.9400698

Abstract

Various features generated from raw audio signals can be used as an input of a deep learning model. They include hand-crafted features such as mel-frequency cepstral coefficients, two-dimensional time-frequency representations and raw audio data. In most cases, the time-frequency representations are related to so-called spectrogram-based images. Having an image at the deep learning input enables to apply performance improvement accumulated in video and image processing. However, spectrogram-based images have some specific properties that should be taken into account when a deep learning model is designed. This paper deals with mapping of audio signals into the most common spectrogram-based images. Some unique properties of these images as well as the way how they are generated are analyzed here for a particular case of fridge sounds.

Full Text