Abstract

Sound event detection (SED) is of great practical and research significance owing to its wide range of applications. However, due to the heavy reliance on dataset size for task performance, there is often a severe lack of data in real-world scenarios. In this study, an improved mean teacher model is utilized to carry out semi-supervised SED, and a perturbed residual recurrent neural network (P-RRNN) is proposed as the SED network. The residual structure is employed to alleviate the problem of network degradation, and pre-training the improved model on the ImageNet dataset enables it to learn information that is beneficial for event detection, thus improving the performance of SED. In the post-processing stage, a customized median filter group with a specific window length is designed to effectively smooth each type of event and minimize the impact of background noise on detection accuracy. Experimental results conducted on the publicly available Detection and Classification of Acoustic Scenes and Events 2019 Task 4 dataset demonstrate that the P-RRNN used for SED in this study can effectively enhance the detection capability of the model. The detection system achieves a Macro Event-based F1 score of 38.8% on the validation set and 40.5% on the evaluation set, indicating that the proposed method can adapt to complex and dynamic SED scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call