Adaptive Pooling Operators for Weakly Labeled Sound Event Detection

Brian Mcfee,Justin Salamon,Juan Pablo Bello

doi:10.1109/taslp.2018.2858559

Brian Mcfee, Justin Salamon + Show 1 more

Open Access

https://doi.org/10.1109/taslp.2018.2858559

Copy DOI

Abstract

Sound event detection (SED) methods are tasked with labeling segments of audio recordings by the presence of active sound sources. SED is typically posed as a supervised machine learning problem, requiring strong annotations for the presence or absence of each sound source at every time instant within the recording. However, strong annotations of this type are both labor- and cost-intensive for human annotators to produce, which limits the practical scalability of SED methods. In this paper, we treat SED as a multiple instance learning (MIL) problem, where training labels are static over a short excerpt, indicating the presence or absence of sound sources but not their temporal locality. The models, however, must still produce temporally dynamic predictions, which must be aggregated (pooled) when comparing against static labels during training. To facilitate this aggregation, we develop a family of adaptive pooling operators—referred to as autopool—which smoothly interpolate between common pooling operators, such as min-, max-, or average-pooling, and automatically adapt to the characteristics of the sound sources in question. We evaluate the proposed pooling operators on three datasets, and demonstrate that in each case, the proposed methods outperform nonadaptive pooling operators for static prediction, and nearly match the performance of models trained with strong, dynamic annotations. The proposed method is evaluated in conjunction with convolutional neural networks, but can be readily applied to any differentiable model for time-series label prediction. While this paper focuses on SED applications, the proposed methods are general, and could be applied widely to MIL problems in any domain.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM transactions on audio, speech, and language processing	Publication Date: Nov 1, 2018
Citations: 194	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Adaptive Pooling Operators for Weakly Labeled Sound Event Detection

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing

Lead the way for us

Similar Papers

Sound Event Localization and Detection Using Convolutional Recurrent Neural Networks and Gated Linear Units
Tatsuya Komatsu ... Masahito Togami
-
Tatsuya Komatsu, et. al.Tatsuya Komatsu ... Masahito Togami
24 Jan 2021
24 Jan 2021

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
Sharath Adavanne ... Tuomas Virtanen
IEEE journal of selected topics in signal processing | VOL. 13
Sharath Adavanne, et. al.Sharath Adavanne ... Tuomas Virtanen
17 Dec 2018
IEEE journal of selected topics in signal processing | VOL. 13

A Model Ensemble Approach for Sound Event Localization and Detection
Qing Wang ... Jia Pan
-
Qing Wang, et. al.Qing Wang ... Jia Pan
24 Jan 2021
24 Jan 2021

A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
Thi Ngoc Tho Nguyen ... Douglas L Jones
-
Thi Ngoc Tho Nguyen, et. al.Thi Ngoc Tho Nguyen ... Douglas L Jones
01 May 2020
01 May 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive Pooling Operators for Weakly Labeled Sound Event Detection

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing