Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET

Jinjia Wang,Qian Yang,Yuzhen Zhang,Jing Xia

doi:10.1109/access.2020.2974479

Jinjia Wang, Qian Yang + Show 2 more

Open Access

PDF Available

https://doi.org/10.1109/access.2020.2974479

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 7	License type: CC BY 4.0

Affiliation: Yanshan University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

One of the most commonly method for sound event detection is the traditional convolutional neural network (CNN) or convolutional recurrent neural network (CRNN) and their variants. However, the pooling operation of the CNN has the disadvantage of losing the location information of the target object. We don’t use the pooling operation, retaining ReLU and convolution operation, and we use the dictionary strong constraints and penalty function prior constraints of the multi-layer convolutional sparse coding (ML-CSC). We proposed iterative deep neural networks, the unfolded multi-layer local block coordinate descent networks (ML-LoBCoD-NET), driven by the multi-layer local block coordinate descent algorithm (ML-LoBCoD) which is extended from the local block coordinate descent (LoBCoD) algorithm. The ML-LoBCoD-NET can extract features different from the CNN. More importantly, for weakly-supervised sound event detection task, we proposed the MRNN-Att network which combines the ML-LoBCoD-NET, a recurrent neural network (RNN), and an attention network. The MCRNN-Att network combines MRNN-Att and CRNN network for fusing the different features. Furthermore, for semi-supervised sound event detection task, the MRNN-Att mean teacher model (MRNN-Att-MT) and the MCRNN-Att mean teacher model (MCRNN-Att-MT) are proposed, in which the MRNN-Att and the MCRNN-Att network are selected as the student model. These models were tested on the dataset of Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 4. The F1 score of the MRNN-Att-MT on the development set was 22.83%, which was 8.77% higher than the baseline system. The score of the MRNN-Att-MT on the evaluation set was 15.68%, which was 4.88% higher than the baseline system. The MCRNN-Att-MT model had an F1 score of 20.35% on the development set, which was 6.29% higher than the baseline system and the F1 score of 14.56% on the evaluation set, which was 3.76% higher than the baseline system.

Highlights

People rely on sounds in the environment to obtain important information
The MRNN-Att network is based on the ML-local block coordinate descent (LoBCoD)-NET which is driven by the ML-LoBCoD algorithm
The F1 score of the MCRNN-Att model was 0.45% higher than that of the GCRNN-Att model. These results indicate that the MRNN-Att model and the MCRNN-Att model were better than the GCRNN-Att model, and the extracted feature of the ML-LoBCoD-NET was effective

Summary

INTRODUCTION

People rely on sounds in the environment to obtain important information. Sound event detection (SED) can detect specific audio events from audio recordings, estimate the starting and offset locations of sound events, and provide a label for each event. J. Wang et al.: Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models trained, and the neural networks can output the results [5]. Inspired by the mean teacher model to solve the semi-supervised problem, this paper proposes two mean teacher models for sound event detection tasks in the domestic environment. The first our proposed mean teacher model is the MRNN-Att-MT, and the student model is the MRNN-Att. The second our proposed mean teacher model is the MCRNN-MT, and the student model is the MCRNN-Att. The weakly-labeled sound event detection task is the core problem, the proposed MRNN-Att network is the core method in this paper.

BACKGROUND

THE PROPOSED ML-LoBCoD ALGORITHM

THE PROPOSED MRNN-ATT-MT MODEL FOR SOUND

EXPERIMENTAL RESULTS AND ANALYSIS

Findings

CONCLUSION

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Adaptive Memory-Controlled Self-Attention for Polyphonic Sound Event Detection
Mei Wang ... Yu Yao
Symmetry | VOL. 14
Mei Wang, et. al.Mei Wang ... Yu Yao
12 Feb 2022
Symmetry | VOL. 14

Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training
Ziqiang Shi ... Rujie Liu
-
Ziqiang Shi, et. al.Ziqiang Shi ... Rujie Liu
24 Jan 2021
24 Jan 2021

Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization
Qiuqiang Kong ... Wenwu Wang
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28
Qiuqiang Kong, et. al.Qiuqiang Kong ... Wenwu Wang
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28

Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
Yong Xu ... Wenwu Wang
-
Yong Xu, et. al.Yong Xu ... Wenwu Wang
01 Apr 2018
01 Apr 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access