Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization

Qiuqiang Kong,Wenwu Wang,Mark D Plumbley,Yong Xu

doi:10.1109/taslp.2020.3014737

Abstract

Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of the SED task is that many datasets such as the Detection and Classification of Acoustic Scenes and Events (DCASE) datasets are weakly labelled. That is, there are only audio tags for each audio clip without the onset and offset times of sound events. We compare segment-wise and clip-wise training for SED that is lacking in previous works. We propose a convolutional neural network transformer (CNN-Transfomer) for audio tagging and SED, and show that CNN-Transformer performs similarly to a convolutional recurrent neural network (CRNN). Another challenge of SED is that thresholds are required for detecting sound events. Previous works set thresholds empirically, and are not an optimal approaches. To solve this problem, we propose an automatic threshold optimization method. The first stage is to optimize the system with respect to metrics that do not depend on thresholds, such as mean average precision (mAP). The second stage is to optimize the thresholds with respect to metrics that depends on those thresholds. Our proposed automatic threshold optimization system achieves a state-of-the-art audio tagging F1 of 0.646, outperforming that without threshold optimization of 0.629, and a sound event detection F1 of 0.584, outperforming that without threshold optimization of 0.564.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2020
Citations: 108

Similar Papers

A Region Based Attention Method for Weakly Supervised Sound Event Detection and Classification
Jie Yan ... Li-Rong Dai
-
Jie Yan, et. al.Jie Yan ... Li-Rong Dai
01 May 2019
01 May 2019

Adaptive Memory-Controlled Self-Attention for Polyphonic Sound Event Detection
Mei Wang ... Hongbin Qiu
Symmetry | VOL. 14
Mei Wang, et. al.Mei Wang ... Hongbin Qiu
12 Feb 2022
Symmetry | VOL. 14

Hodge and Podge: Hybrid Supervised Sound Event Detection with Multi-Hot MixMatch and Composition Consistence Training
Ziqiang Shi ... Liu Liu
-
Ziqiang Shi, et. al.Ziqiang Shi ... Liu Liu
24 Jan 2021
24 Jan 2021

A Method Based on Dual Cross-Modal Attention and Parameter Sharing for Polyphonic Sound Event Localization and Detection
Sang-Hoon Lee ... Hyung-Min Park
Applied Sciences | VOL. 12
Sang-Hoon Lee, et. al.Sang-Hoon Lee ... Hyung-Min Park
18 May 2022
Applied Sciences | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing