A Multi-task Learning Approach Based on Convolutional Neural Network for Acoustic Scene Classification

Shilei Huang,Kuilong Xu,Xiao Song,Gang Cheng

doi:10.1145/3377713.3377720

Abstract

Acoustic Scene Classification (ASC) aim to recognize an acoustic scene in audio signal records. The acoustic scene is a mixture of background sounds and various sound events, and sound events often determine the type of acoustic scene. However, in many research methods for acoustic scene classification, only a few people have noticed the important information of sound events. In this paper, we combine the ASC task and Sound Event Detection (SED) task, and propose a new CNN approach with multi-task Learning (MTL), which uses SED as an auxiliary task to pay more attention to the information of the sound event in the model. Besides, in view of the characteristic of the sound event with high-energy time-frequency components, we use Global Max Pooling (GMP) instead of the Fully Connected layer (FC) in the traditional CNN. The advantage is that the model focused on distinct high-energy time-frequency components of audio signals (sound event). Finally, extensive experiments are carried out on the TUT acoustic scene 2017 dataset. Our proposed CNN approach with MTL shows better generalization, and improves the Unweighted Average Recall (UAR) of 5.2% over the DCASE 2017 ASC baseline system.

Full Text