An improved weakly supervised learning system for detection of sound events in domestic environments

Dezhi Wang,Dan Zou,Wenbin Xiao,Lilun Zhang,Xinghua Cheng

doi:10.1145/3369985.3370007

Abstract

In this paper, an improved learning system is proposed for sound event detection (SED) task typically in the domestic environments. Normally, if all the detailed timestamps of sound events in audio files are given in the training dataset, the SED system can be worked out as a fully supervised learning method, which can be similar as these methods applied in the audio tagging or acoustic scene classification tasks. However, the acquisition of large-scale fully labeled SED datasets is extremely difficult in practice due to the high cost of labor and lack of professional knowledge. Thus, how to solve the SED problem by making the full use of unlabeled and weakly labeled audio dataset has attracted a lot of attentions in this research field. In this work, an improved weakly supervised learning framework on the basis of convolutional recurrent neural network (CRNN) is employed to achieve the sound event timestamps using a small size of fully labeled dataset and large-scale weakly labeled and unlabeled dataset. A mix-up based data augmentation technique and an automatic threshold searching strategy for sound events are also proposed to enhance the system performance. The system is finally evaluated by using the DCASE 2019 SED dataset, which obtains up to 23.79% in terms of class-wise average F1-score for the detection of sound events.

Full Text