Scene-Dependent Acoustic Event Detection with Scene Conditioning and Fake-Scene-Conditioned Loss

Tatsuya Komatsu,Keisuke Imoto,Masahito Togami

doi:10.1109/icassp40776.2020.9053702

Abstract

In this paper, we propose scene-dependent acoustic event detection (AED) with scene conditioning and fake-scene-conditioned loss. The proposed method employs a multitask network, that has not only AED part but also acoustic scene classification (ASC). The scenes predicted by ASC are employed as an additional feature for scene conditioning of AED to learn the relationship between scenes and events. For efficient training, the proposed method incorporates a new AED loss function, which is the fake-scene-conditioned loss, in addition to the conventional AED loss. Upon training, the AED part is conditioned with fake scenes as well as predicted and true scenes. The fake-scene-conditioned loss is calculated between the fake-scene-conditioned AED results and labels of events that do not exist in the fake scenes are removed. Whereas training with combinations of true scenes/events, i.e., the conventional AED loss, only reveals that an event is present in a scene, with fake-scene-conditioned loss, the proposed method can learn that an event is absent in a scene. Experimental results show that the proposed method improves the AED performance compared with the baseline; an increase in the f1 score of 23% and a decrease in the false alarm rate of 56% for scenes where no event exists.

Full Text