Deep Convolutional Neural Network with Mixup for Environmental Sound Classification

Zhichao Zhang,Shan Cao,Shunqing Zhang,Shugong Xu

doi:10.1007/978-3-030-03335-4_31

Abstract

Environmental sound classification (ESC) is an important and challenging problem. In contrast to speech, sound events have noise-like nature and may be produced by a wide variety of sources. In this paper, we propose to use a novel deep convolutional neural network for ESC tasks. Our network architecture uses stacked convolutional and pooling layers to extract high-level feature representations from spectrogram-like features. Furthermore, we apply mixup to ESC tasks and explore its impacts on classification performance and feature distribution. Experiments were conducted on UrbanSound8K, ESC-50 and ESC-10 datasets. Our experimental results demonstrated that our ESC system has achieved the state-of-the-art performance (83.7 $$\%$$ ) on UrbanSound8K and competitive performance on ESC-50 and ESC-10.

Full Text