Classification of Environmental Sounds Through Spectrogram-Like Images Using Dilation-Based CNN

Asadulla Ashurov,Yu Zhao,Liming Shi,Hongqing Liu,Yi Zhou

doi:10.1109/iccc56324.2022.10065648

Abstract

In recent years, convolutional neural networks (CNNs) have gained remarkable success in various image and audio categorization and detection tasks. Using dilated convolution in CNN can extend the network's receptive field and improve its performance, and it can also achieve a lightweight model by compressing CNN. Environmental Sound Classification (ESC) is an essential aspect of the non-speech audio classification problem and has attracted the scientific community's interest due to the breakthrough in learning algorithms ESC. In this study, we propose a novel dilation-based CNN model from scratch and a baseline classic CNN model under the same condition and parameters. The dilation-based approach is very flexible and can be used for several standard CNNs to improve model performances. In addition, this strategy has a high training efficiency. The experimental results obtained on the publicly accessible Urbansounds8K dataset indicate that the proposed technique outperforms the classic CNN methods. Dilation-based network enhances accuracy and F1 values by around 4-9% compared to the baseline method.

Full Text