Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks

Khaled Koutini,Hamid Eghbal-Zadeh,Gerhard Widmer

doi:10.1109/taslp.2021.3082307

Abstract

In this paper, we study the performance of variants of well-known Convolutional Neural Network (CNN) architectures on different audio tasks. We show that tuning the Receptive Field (RF) of CNNs is crucial to their generalization. An insufficient RF limits the CNN's ability to fit the training data. In contrast, CNNs with an excessive RF tend to over-fit the training data and fail to generalize to unseen testing data. As state-of-the-art CNN architectures - in computer vision and other domains - tend to go deeper in terms of number of layers, their RF size increases and therefore they degrade in performance in several audio classification and tagging tasks. We study well-known CNN architectures and how their building blocks affect their receptive field. We propose several systematic approaches to control the RF of CNNs and systematically test the resulting architectures on different audio classification and tagging tasks and datasets. The experiments show that regularizing the RF of CNNs using our proposed approaches can drastically improve the generalization of models, out-performing complex architectures and pre-trained models on larger datasets. The proposed CNNs achieve state-of-the-art results in multiple tasks, from acoustic scene classification to emotion and theme detection in music to instrument recognition, as demonstrated by top ranks in several pertinent challenges (DCASE, MediaEval).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2021
Citations: 23

Similar Papers

Texture Patterns for Object Recognition and Content-Based Color Image Retrieval

-

21 Dec 2020
21 Dec 2020

Receptive-Field-Regularized CNN Variants for Acoustic Scene Classification
Khaled Koutini ... Gerhard Widmer
-
Khaled Koutini, et. al.Khaled Koutini ... Gerhard Widmer
01 Jan 2019
01 Jan 2019

A novel study for automatic two-class COVID-19 diagnosis (between COVID-19 and Healthy, Pneumonia) on X-ray images using texture analysis and 2-D/3-D convolutional neural networks.
Huseyin Yaşar ... Murat Ceylan
Multimedia systems | VOL. 37
Huseyin Yaşar, et. al.Huseyin Yaşar ... Murat Ceylan
29 Jan 2022
Multimedia systems | VOL. 37

Classification of Hand-Drawn Basic Circuit Components Using Convolutional Neural Networks
Mihriban Gunay ... Murat Koseoglu
-
Mihriban Gunay, et. al.Mihriban Gunay ... Murat Koseoglu
01 Jun 2020
01 Jun 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing