Abstract
This paper proposes sound event localization and detection methods from multichannel recording. The proposed system is based on two Convolutional Recurrent Neural Networks (CRNNs) to perform sound event detection (SED) and time difference of arrival (TDOA) estimation on each pair of microphones in a microphone array. In this paper, the system is evaluated with a four-microphone array, and thus combines the results from six pairs of microphones to provide a final classification and a 3-D direction of arrival (DOA) estimate. Results demonstrate that the proposed approach outperforms the DCASE 2019 baseline system.
Highlights
Sound Event Detection (SED) is an important machine listening task, which aims to automatically recognize, label, and estimate the position in time of sound events in a continuous audio signal
The prevailing architectures used for SED are Convolutional Neural Networks (CNNs) [14], which are successful in computer vision tasks
In this paper we propose a system for sound event detection and localization (SELD), which we submitted to Task3 of the DCASE2019 Challenge [31]
Summary
Sound Event Detection (SED) is an important machine listening task, which aims to automatically recognize, label, and estimate the position in time of sound events in a continuous audio signal. This is a popular research topic, due to the number of real-world applications for SED such as home-care [1], surveillance [2], environmental monitoring [3] or urban traffic control [4], to name just a few. Successful Detection and Classification of Acoustic Scenes and Events (DCASE) challenges [5, 6] provide the community with datasets and baselines for a number of tasks related to SED. Other common approaches try to model time relations in audio signal by using recurrent neural networks (RNNs)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have