Abstract

In this paper, an improved learning system is proposed for sound event detection (SED) task typically in the domestic environments. Normally, if all the detailed timestamps of sound events in audio files are given in the training dataset, the SED system can be worked out as a fully supervised learning method, which can be similar as these methods applied in the audio tagging or acoustic scene classification tasks. However, the acquisition of large-scale fully labeled SED datasets is extremely difficult in practice due to the high cost of labor and lack of professional knowledge. Thus, how to solve the SED problem by making the full use of unlabeled and weakly labeled audio dataset has attracted a lot of attentions in this research field. In this work, an improved weakly supervised learning framework on the basis of convolutional recurrent neural network (CRNN) is employed to achieve the sound event timestamps using a small size of fully labeled dataset and large-scale weakly labeled and unlabeled dataset. A mix-up based data augmentation technique and an automatic threshold searching strategy for sound events are also proposed to enhance the system performance. The system is finally evaluated by using the DCASE 2019 SED dataset, which obtains up to 23.79% in terms of class-wise average F1-score for the detection of sound events.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.