Sound Event Localization and Detection Using CRNN on Pairs of Microphones

Francois Grondin,Mark Plumbley,James Glass,Iwona Sobieraj

doi:10.33682/4v2a-7q02

Abstract

This paper proposes sound event localization and detection methods from multichannel recording. The proposed system is based on two Convolutional Recurrent Neural Networks (CRNNs) to perform sound event detection (SED) and time difference of arrival (TDOA) estimation on each pair of microphones in a microphone array. In this paper, the system is evaluated with a four-microphone array, and thus combines the results from six pairs of microphones to provide a final classification and a 3-D direction of arrival (DOA) estimate. Results demonstrate that the proposed approach outperforms the DCASE 2019 baseline system.

Highlights

Sound Event Detection (SED) is an important machine listening task, which aims to automatically recognize, label, and estimate the position in time of sound events in a continuous audio signal
The prevailing architectures used for SED are Convolutional Neural Networks (CNNs) [14], which are successful in computer vision tasks
In this paper we propose a system for sound event detection and localization (SELD), which we submitted to Task3 of the DCASE2019 Challenge [31]

Summary

Introduction

Sound Event Detection (SED) is an important machine listening task, which aims to automatically recognize, label, and estimate the position in time of sound events in a continuous audio signal. This is a popular research topic, due to the number of real-world applications for SED such as home-care [1], surveillance [2], environmental monitoring [3] or urban traffic control [4], to name just a few. Successful Detection and Classification of Acoustic Scenes and Events (DCASE) challenges [5, 6] provide the community with datasets and baselines for a number of tasks related to SED. Other common approaches try to model time relations in audio signal by using recurrent neural networks (RNNs)

Methods

Results

Conclusion