Semi-Supervised NMF-CNN for Sound Event Detection

Teck Kai Chan,Cheng Siong Chin,Ye Li

doi:10.1109/access.2021.3113903

Teck Kai Chan, Cheng Siong Chin + Show 1 more

Open Access

PDF Available

https://doi.org/10.1109/access.2021.3113903

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

The lack of strongly labeled data can limit the potential of a Sound Event Detection (SED) system trained using deep learning approaches. To address this issue, this paper proposes a novel method to approximate strong labels for the weakly labeled data using Nonnegative Matrix Factorization (NMF) in a supervised manner. Using a combinative transfer learning and semi-supervised learning framework, two different Convolutional Neural Networks (CNN) are trained using synthetic data, approximated strongly labeled data, and unlabeled data where one model will produce the audio tags. In contrast, the other will produce the frame-level prediction. The proposed methodology is then evaluated on three different subsets of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 dataset: validation dataset, challenge evaluation dataset, and public YouTube evaluation dataset. Based on the results, our proposed methodology outperforms the baseline system by a minimum of 7% across these three different data subsets. In addition, our proposed method also outperforms the top 3 submissions from the DCASE 2019 challenge task 4 on the validation and public YouTube evaluation datasets. Our system performance is also competitive against the top submission in DCASE 2020 challenge task 4 on the challenge evaluation data. A post-challenge analysis was also performed using the validation dataset, which revealed the causes of the performance difference between our system and the top submission of the DCASE 2020 challenge task 4. The leading causes that we observed are 1) detection threshold tuning method and 2) augmentation techniques used. We observed that our system could perform better than the first place submission by 1.5% by changing our detection threshold tuning method. In addition, the post-challenge analysis also revealed that our system is more robust than the top submission in DCASE 2020 challenge task 4 on long-duration audio clips, where we outperformed them by 37%.

Highlights

A N auditory scene is made up of several different sound events which overlap in time and frequency, resulting in a complex array of acoustic information reaching the human’s ears
This paper focuses on Sound Event Detection (SED), which reflects an aspect of the human auditory system
We propose a novel methodology to label the weakly labeled data, where only the event tags are known with certainty, using Nonnegative Matrix Factorization (NMF) [14] in a supervised manner

Summary

INTRODUCTION

A N auditory scene is made up of several different sound events which overlap in time and frequency, resulting in a complex array of acoustic information reaching the human’s ears. Whereas the second subtask can be referred to the temporal localization [3] Such a problem is more likely to be solved when one had access to a large corpus of strongly labeled data where the event tags and corresponding onsets and offsets are known with certainty. Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS posals submitted to the annual Detection and Classification of Acoustic Scenes and Events (DCASE) challenge task 4 indicate that weak labels can be an effective alternative to train a SED system [6]–[9]. We propose a novel methodology to label the weakly labeled data, where only the event tags are known with certainty, using NMF [14] in a supervised manner.

RELATED WORK

PROPOSED METHODOLOGY

APPROXIMATING STRONG LABELS USING NMF

SEMI-SUPERVISED LEARNING

2: Output

EXPERIMENT SETUP

Methodology

Findings

VIII. CONCLUSION

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 12	License type: CC BY 4.0

R Discovery Prime

Semi-Supervised NMF-CNN for Sound Event Detection

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Adaptive Memory-Controlled Self-Attention for Polyphonic Sound Event Detection
Mei Wang ... Yu Yao
Symmetry | VOL. 14
Mei Wang, et. al.Mei Wang ... Yu Yao
12 Feb 2022
Symmetry | VOL. 14

Polyphonic Sound Event Detection Based on Residual Convolutional Recurrent Neural Network With Semi-Supervised Loss Function
Nam Kyun Kim ... Hong Kook Kim
IEEE Access | VOL. 9
Nam Kyun Kim, et. al.Nam Kyun Kim ... Hong Kook Kim
01 Jan 2020
IEEE Access | VOL. 9

Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training
Ziqiang Shi ... Rujie Liu
-
Ziqiang Shi, et. al.Ziqiang Shi ... Rujie Liu
24 Jan 2021
24 Jan 2021

Creating a new research community on detection and classification of acoustic scenes and events: Lessons from the first ten years of DCASE challenges and workshops
Mark Plumbley ... Tuomas Virtanen
INTER-NOISE and NOISE-CON Congress and Conference Proceedings | VOL. 265
Mark Plumbley, et. al.Mark Plumbley ... Tuomas Virtanen
01 Feb 2023
INTER-NOISE and NOISE-CON Congress and Conference Proceedings | VOL. 265

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Semi-Supervised NMF-CNN for Sound Event Detection

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: IEEE Access