Joint Spatio-Temporal-Frequency Representation Learning for Improved Sound Event Localization and Detection.

Baoqing Chen,Mei Wang,Yu Gu

doi:10.3390/s24186090

Abstract

Sound event localization and detection (SELD) is a crucial component of machine listening that aims to simultaneously identify and localize sound events in multichannel audio recordings. This task demands an integrated analysis of spatial, temporal, and frequency domains to accurately characterize sound events. The spatial domain pertains to the varying acoustic signals captured by multichannel microphones, which are essential for determining the location of sound sources. However, the majority of recent studies have focused on time-frequency correlations and spatio-temporal correlations separately, leading to inadequate performance in real-life scenarios. In this paper, we propose a novel SELD method that utilizes the newly developed Spatio-Temporal-Frequency Fusion Network (STFF-Net) to jointly learn comprehensive features across spatial, temporal, and frequency domains of sound events. The backbone of our STFF-Net is the Enhanced-3D (E3D) residual block, which combines 3D convolutions with a parameter-free attention mechanism to capture and refine the intricate correlations among these domains. Furthermore, our method incorporates the multi-ACCDOA format to effectively handle homogeneous overlaps between sound events. During the evaluation, we conduct extensive experiments on three de facto benchmark datasets, and our results demonstrate that the proposed SELD method significantly outperforms current state-of-the-art approaches.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Joint Spatio-Temporal-Frequency Representation Learning for Improved Sound Event Localization and Detection.

Abstract

Published Version

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Journal: Sensors (Basel, Switzerland)	Publication Date: Sep 20, 2024
License type: cc-by

Similar Papers

A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
Thi Ngoc Tho Nguyen ... Douglas L Jones
-
Thi Ngoc Tho Nguyen, et. al.Thi Ngoc Tho Nguyen ... Douglas L Jones
01 May 2020
01 May 2020

DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection
...
arXiv (Cornell University) | VOL. -
, et. al. ...
29 Jun 2021
DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection
...

A Model Ensemble Approach for Sound Event Localization and Detection
Qing Wang ... Yi Fang
-
Qing Wang, et. al.Qing Wang ... Yi Fang
24 Jan 2021
24 Jan 2021

A Transpose-SELDNet for Polyphonic Sound Event Localization and Detection
Spoorthy V ... Shashidhar G Koolagudi
-
Spoorthy V, et. al.Spoorthy V ... Shashidhar G Koolagudi
07 Apr 2023
07 Apr 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Joint Spatio-Temporal-Frequency Representation Learning for Improved Sound Event Localization and Detection.

Abstract

Published Version

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)