Learning Temporal Resolution in Spectrogram for Audio Classification

Haohe Liu,Wenwu Wang,Xubo Liu,Mark D Plumbley,Qiuqiang Kong

doi:10.1609/aaai.v38i12.29294

Abstract

The audio spectrogram is a time-frequency representation that has been widely used for audio classification. One of the key attributes of the audio spectrogram is the temporal resolution, which depends on the hop size used in the Short-Time Fourier Transform (STFT). Previous works generally assume the hop size should be a constant value (e.g., 10 ms). However, a fixed temporal resolution is not always optimal for different types of sound. The temporal resolution affects not only classification accuracy but also computational cost. This paper proposes a novel method, DiffRes, that enables differentiable temporal resolution modeling for audio classification. Given a spectrogram calculated with a fixed hop size, DiffRes merges non-essential time frames while preserving important frames. DiffRes acts as a "drop-in" module between an audio spectrogram and a classifier and can be jointly optimized with the classification task. We evaluate DiffRes on five audio classification tasks, using mel-spectrograms as the acoustic features, followed by off-the-shelf classifier backbones. Compared with previous methods using the fixed temporal resolution, the DiffRes-based method can achieve the equivalent or better classification accuracy with at least 25% computational cost reduction. We further show that DiffRes can improve classification accuracy by increasing the temporal resolution of input acoustic features, without adding to the computational cost.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning Temporal Resolution in Spectrogram for Audio Classification

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Mar 24, 2024
Citations: 3

Similar Papers

DTF-AT: Decoupled Time-Frequency Audio Transformer for Event Classification
Tony Alex ... Philip Jb Jackson
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Tony Alex, et. al.Tony Alex ... Philip Jb Jackson
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

Learning long-term filter banks for audio source separation and audio scene classification
Teng Zhang ... Ji Wu
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2018
Teng Zhang, et. al.Teng Zhang ... Ji Wu
30 May 2018
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2018

Ensemble of convolutional neural networks to improve animal audio classification
Loris Nanni ... Yandre M G Costa
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2020
Loris Nanni, et. al.Loris Nanni ... Yandre M G Costa
26 May 2020
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2020

Audio Classification and Retrieval Using Wavelets and Gaussian Mixture Models
Ching-Hua Chuan
International Journal of Multimedia Data Engineering and Management | VOL. 4
Ching-Hua ChuanChing-Hua Chuan
01 Jan 2013
International Journal of Multimedia Data Engineering and Management | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Temporal Resolution in Spectrogram for Audio Classification

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence