Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection

Yunhao Liang,Yanhua Long,Yijie Li,Jiaen Liang,Yuping Wang

doi:10.1016/j.dsp.2022.103446

Abstract

A good joint training framework is very helpful to improve the performances of weakly supervised audio tagging (AT) and acoustic event detection (AED) simultaneously. In this study, we propose three methods to improve the best teacher-student framework in the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 4 for both audio tagging and acoustic events detection tasks. A frame-level target-events based deep feature distillation is first proposed, which aims to leverage the potential of limited strong-labeled data in weakly supervised framework to learn better intermediate feature maps. Then, we propose an adaptive focal loss and two-stage training strategy to enable an effective and more accurate model training, where the contribution of hard and easy acoustic events to the total cost function can be automatically adjusted. Furthermore, an event-specific post processing is designed to improve the prediction of target event time-stamps. Our experiments are performed on the public DCASE 2019 Task 4 dataset, results show that our approach achieves competitive performances in both AT (81.2% F1-score) and AED (49.8% F1-score) tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection

Abstract

Talk to us

Similar Papers

More From: Digital Signal Processing

Lead the way for us

Journal: Digital Signal Processing	Publication Date: Jan 30, 2022
Citations: 7

Similar Papers

A Region Based Attention Method for Weakly Supervised Sound Event Detection and Classification
Jie Yan ... Li-Rong Dai
-
Jie Yan, et. al.Jie Yan ... Li-Rong Dai
01 May 2019
01 May 2019

Staged Training Strategy and Multi-Activation for Audio Tagging with Noisy and Sparse Multi-Label Data
Kexin He ... Wei-Qiang Zhang
-
Kexin He, et. al.Kexin He ... Wei-Qiang Zhang
11 Apr 2020
11 Apr 2020

Creating a new research community on detection and classification of acoustic scenes and events: Lessons from the first ten years of DCASE challenges and workshops
Mark Plumbley ... Tuomas Virtanen
INTER-NOISE and NOISE-CON Congress and Conference Proceedings | VOL. 265
Mark Plumbley, et. al.Mark Plumbley ... Tuomas Virtanen
01 Feb 2023
INTER-NOISE and NOISE-CON Congress and Conference Proceedings | VOL. 265

Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network
Yong Xu ... Mark D Plumbley
-
Yong Xu, et. al.Yong Xu ... Mark D Plumbley
01 Apr 2018
01 Apr 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection

Abstract

Talk to us

Similar Papers

More From: Digital Signal Processing