Abstract

Acoustic Scene Classification (ASC) aim to recognize an acoustic scene in audio signal records. The acoustic scene is a mixture of background sounds and various sound events, and sound events often determine the type of acoustic scene. However, in many research methods for acoustic scene classification, only a few people have noticed the important information of sound events. In this paper, we combine the ASC task and Sound Event Detection (SED) task, and propose a new CNN approach with multi-task Learning (MTL), which uses SED as an auxiliary task to pay more attention to the information of the sound event in the model. Besides, in view of the characteristic of the sound event with high-energy time-frequency components, we use Global Max Pooling (GMP) instead of the Fully Connected layer (FC) in the traditional CNN. The advantage is that the model focused on distinct high-energy time-frequency components of audio signals (sound event). Finally, extensive experiments are carried out on the TUT acoustic scene 2017 dataset. Our proposed CNN approach with MTL shows better generalization, and improves the Unweighted Average Recall (UAR) of 5.2% over the DCASE 2017 ASC baseline system.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call