Multitask learning for acoustic scene classification with topic-based soft labels and a mutual attention mechanism

Yan Leng,Jian Zhuang,Jie Pan,Chengli Sun

doi:10.1016/j.knosys.2023.110460

Abstract

Acoustic scene classification (ASC) is a fundamental task of computational sound scene analysis that aims to identify the acoustic environment via audio. Many multitask learning (MTL) models have been proposed in computational sound scene analysis, but most of them are for acoustic event detection (AED). Existing MTL models for ASC usually leverage the knowledge of the primary and auxiliary tasks only via the shared layers and train the network using hard labels. They do not take advantage of the information contained in the primary and auxiliary tasks to improve the generalization performance, and ignore modeling the relationship between events, scenes or groups. Moreover, some models have the problem of subjectivity since they generate labels via observations, and subjectivity can create unreasonable information, which may restrict the improvement of system performance. To address these issues, we propose a novel MTL scheme for ASC that employs a mutual attention mechanism to explore the information contained in the primary and auxiliary tasks and employs a neural topic model to generate soft group labels automatically. The proposed method can model the relationship between groups and allows the primary and auxiliary tasks to make full use of each other’s information to improve generalization performance. Experimental results on two real-world datasets show that our MTL scheme can make full use of the auxiliary task to improve the performance of the ASC primary task and achieves significant improvements compared to baselines.

Full Text