A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge

Diego De Benito-Gorron,Doroteo T Toledano,Daniel Ramos

doi:10.1109/access.2021.3088949

Diego De Benito-Gorron, Doroteo T Toledano + Show 1 more

Open Access

https://doi.org/10.1109/access.2021.3088949

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 16	License type: CC BY 4.0

Affiliation: Autonomous University of Madrid

Abstract

Sound Event Detection is a task with a rising relevance over the recent years in the field of audio signal processing, due to the creation of specific datasets such as Google AudioSet or DESED (Domestic Environment Sound Event Detection) and the introduction of competitive evaluations like the DCASE Challenge (Detection and Classification of Acoustic Scenes and Events). The different categories of acoustic events can present diverse temporal and spectral characteristics. However, most approaches use a fixed time-frequency resolution to represent the audio segments. This work proposes a multi-resolution analysis for feature extraction in Sound Event Detection, hypothesizing that different resolutions can be more adequate for the detection of different sound event categories, and that combining the information provided by multiple resolutions could improve the performance of Sound Event Detection systems. Experiments are carried out over the DESED dataset in the context of the DCASE 2020 Challenge, concluding that the combination of up to 5 resolutions allows a neural network-based system to obtain better results than single-resolution models in terms of event-based F1-score in every event category and in terms of PSDS (Polyphonic Sound Detection Score). Furthermore, we analyze the impact of score thresholding in the computation of F1-score results, finding that the standard value of 0.5 is suboptimal and proposing an alternative strategy based in the use of a specific threshold for each event category, which obtains further improvements in performance.

Highlights

U NDERSTANDING the acoustic environment is an ongoing challenge for artificial intelligence which has motivated several research fields
Taking into account that the BS resolution point coincides with the baseline system of DCASE Challenge 2020 Task 4, the results obtained using this resolution constitute the common benchmark for the aforementioned task
The results of the single-resolution models are presented in Table 3 as the mean and the standard deviation of the F1-scores obtained with the five trainings

Summary

INTRODUCTION

U NDERSTANDING the acoustic environment is an ongoing challenge for artificial intelligence which has motivated several research fields. The use of two different resolutions has been proposed to improve automatic speech recognition in reverberant scenarios [23], in which a wide-context window gives information about the acoustic environment and reverberation, whereas a narrow-context window provides finer detail about the content of the speech signal This is possible due to the existence of a tradeoff between time resolution and frequency resolution in the extraction of Fast Fourier Transform-based audio features [24] such as the mel-spectrogram, which is the base for the analysis proposed in this work. The proposed analysis is tested using a state-of-the-art system, the baseline for DCASE 2020 Challenge Task 4 “Detection and Separation of Sound Events in Domestic Environments” [25] The aim of this challenge is to make use of unlabeled and weakly-labeled recordings, together with strongly-labeled synthetic audio clips, to train systems that predict the temporal locations of ten different event categories in audio recordings.

DESED DATASET

EXPERIMENTAL FRAMEWORK

MODEL FUSION

SCORE POST-PROCESSING

RESULTS AND DISCUSSION

SCORE THRESHOLDING RESULTS

EVENT OVERLAP ANALYSIS

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A Method Based on Dual Cross-Modal Attention and Parameter Sharing for Polyphonic Sound Event Localization and Detection
Sang-Hoon Lee ... Hyung-Min Park
Applied Sciences | VOL. 12
Sang-Hoon Lee, et. al.Sang-Hoon Lee ... Hyung-Min Park
18 May 2022
Applied Sciences | VOL. 12

An Analysis of Sound Event Detection under Acoustic Degradation Using Multi-Resolution Systems
Diego De Benito-Gorrón ... Doroteo T Toledano
Applied Sciences | VOL. 11
Diego De Benito-Gorrón, et. al.Diego De Benito-Gorrón ... Doroteo T Toledano
06 Dec 2021
Applied Sciences | VOL. 11

Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge
Emmanouil Benetos ... Peter Foster
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26
Emmanouil Benetos, et. al.Emmanouil Benetos ... Peter Foster
28 Nov 2017
Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge
Emmanouil Benetos ... Peter Foster

Sound Event Detection System Based on VGGSKCCT Model Architecture with Knowledge Distillation
Chia-Ping Chen ... Chia-Chuan Liu
Applied Artificial Intelligence | VOL. 37
Chia-Ping Chen, et. al.Chia-Ping Chen ... Chia-Chuan Liu
16 Dec 2022
Applied Artificial Intelligence | VOL. 37

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Multi-Resolution CRNN-Based Approach for Semi-Supervised Sound Event Detection in DCASE 2020 Challenge

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access