Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

Yin Cao,Turab Iqbal,Qiuqiang Kong,Fengyan An,Wenwu Wang,Mark Plumbley

doi:10.33682/4jhy-bj81

Abstract

Sound event detection (SED) and localization refer to recognizing sound events and estimating their spatial and temporal locations. Using neural networks has become the prevailing method for SED. In the area of sound localization, which is usually performed by estimating the direction of arrival (DOA), learning-based methods have recently been developed. In this paper, it is experimentally shown that the trained SED model is able to contribute to the direction of arrival estimation (DOAE). However, joint training of SED and DOAE degrades the performance of both. Based on these results, a two-stage polyphonic sound event detection and localization method is proposed. The method learns SED first, after which the learned feature layers are transferred for DOAE. It then uses the SED ground truth as a mask to train DOAE. The proposed method is evaluated on the DCASE 2019 Task 3 dataset, which contains different overlapping sound events in different environments. Experimental results show that the proposed method is able to improve the performance of both SED and DOAE, and also performs significantly better than the baseline method.

Highlights

Sound event detection is a rapidly developing research area that aims to analyze and recognize a variety of sounds in urban and natural environments
The results of direction of arrival (DOA) and DOANT show that with trained convolutional neural networks (CNNs) layers transferred, DOA error is consistently lower than not transferring, which indicates that Sound event detection (SED) information contributes to the direction of arrival estimation (DOAE) performance; it can be observed that the convergence speed is much faster with CNN layers transferred
Comparing SELDnet with DOA-NT, it shows that the joint training is better than the training of DOAE without CNN layers transferred, which proves SED contributes to DOAE

Summary

Introduction

Sound event detection is a rapidly developing research area that aims to analyze and recognize a variety of sounds in urban and natural environments. Due to their success in image recognition, convolutional neural networks (CNNs) have become the prevailing architecture in this area [7,8,9,10]. Such methods use suitable time-frequency representations of audio, which are analogous to the image inputs in computer vision. Another popular type of neural network is the recurrent neural network (RNN), which has the ability to learn long temporal patterns present in the data, making it suitable for SED [11]. Hybrids containing both CNN and RNN layers, known as convolutional recurrent neural networks (CRNNs), have been proposed, which have led to state-of-the-art performance in SED [4, 12]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2019
Citations: 81	License type: cc-by

Similar Papers

A General Network Architecture for Sound Event Localization and Detection Using Transfer Learning and Recurrent Neural Network
Thi Ngoc Tho Nguyen ... Woon-Seng Gan
-
Thi Ngoc Tho Nguyen, et. al.Thi Ngoc Tho Nguyen ... Woon-Seng Gan
06 Jun 2021
06 Jun 2021

A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
Thi Ngoc Tho Nguyen ... Woon-Seng Gan
-
Thi Ngoc Tho Nguyen, et. al.Thi Ngoc Tho Nguyen ... Woon-Seng Gan
01 May 2020
01 May 2020

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
Sharath Adavanne ... Archontis Politis
IEEE Journal of Selected Topics in Signal Processing | VOL. 13
Sharath Adavanne, et. al.Sharath Adavanne ... Archontis Politis
17 Dec 2018
IEEE Journal of Selected Topics in Signal Processing | VOL. 13

Sound Event Localization and Detection Using Convolutional Recurrent Neural Networks and Gated Linear Units
Tatsuya Komatsu ... Masahito Togami
-
Tatsuya Komatsu, et. al.Tatsuya Komatsu ... Masahito Togami
24 Jan 2021
24 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy

Abstract

Highlights

Summary

Talk to us

Similar Papers