MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network for Sound Event Detection

Keming Zhang,Liang He,Ruida Ye,Yuanwen Cai,Yuan Ren

doi:10.1109/access.2020.3015047

Keming Zhang, Liang He + Show 3 more

Open Access

https://doi.org/10.1109/access.2020.3015047

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 12	License type: CC BY 4.0

Affiliation: Space Engineering University, Tsinghua University

Abstract

To reduce neural network parameter counts and improve sound event detection performance, we propose a multiscale time-frequency convolutional recurrent neural network (MTF-CRNN) for sound event detection. Our goal is to improve sound event detection performance and recognize target sound events with variable duration and different audio backgrounds with low parameter counts. We exploit four groups of parallel and serial convolutional kernels to learn high-level shift-invariant features from the time and frequency domains of acoustic samples. A two-layer bidirectional gated recurrent unit is used to capture the temporal context from the extracted high-level features. The proposed method is evaluated on two different sound event datasets. Compared to that of the baseline method and other methods, the performance is greatly improved as a single model with low parameter counts without pretraining. On the TUT Rare Sound Events 2017 evaluation dataset, our method achieved an error rate (ER) of 0.09±0.01, which was an improvement of 83% compared with the baseline. On the TAU Spatial Sound Events 2019 evaluation dataset, our system achieved an ER of 0.11±0.01, a relative improvement over the baseline of 61%, and F1 and ER values that are better than those of the development dataset. Compared to the state-of-the-art methods, our proposed network achieves competitive detection performance with only one-fifth of the network parameter counts.

Highlights

Sound event detection (SED) recognizes a target sound and detects the onset and offset times in an audio recording
We can see that the 3×3 convolutional recurrent neural networks (CRNNs) performs better than the other STF-CRNNs; the reason might be because its scale is appropriate for the dataset
SED is a subtask of DCASE2019 Task 3, so the SED performance of CE-CRNN is slightly worse than Twostage [50]

Summary

Introduction

Sound event detection (SED) recognizes a target sound and detects the onset and offset times in an audio recording. We can detect audio signals such as gunshots, the cries of babies, falls, and the malfunctioning of a machine through the sound that is given off, as well as endangered animal sounds, which allows us to respond appropriately [1]. Traditional sound event detection methods mainly include signal analysis, information entropy [5], statistical analysis, and clustering [6], [7]. Research on SED has shifted from Gaussian mixture models-hidden Markov models (GMM-HMMs) and support vector machines (SVMs) [8] to deep neural networks (DNNs) [9], convolutional neural networks (CNNs), recurrent neural networks (RNNs), and convolutional recurrent neural networks (CRNNs)

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network for Sound Event Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A Model Ensemble Approach for Sound Event Localization and Detection
Qing Wang ... Jia Pan
-
Qing Wang, et. al.Qing Wang ... Jia Pan
24 Jan 2021
24 Jan 2021

Sound Event Localization and Detection Using Convolutional Recurrent Neural Networks and Gated Linear Units
Tatsuya Komatsu ... Masahito Togami
-
Tatsuya Komatsu, et. al.Tatsuya Komatsu ... Masahito Togami
24 Jan 2021
24 Jan 2021

A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
Thi Ngoc Tho Nguyen ... Woon-Seng Gan
-
Thi Ngoc Tho Nguyen, et. al.Thi Ngoc Tho Nguyen ... Woon-Seng Gan
01 May 2020
01 May 2020

Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
Sharath Adavanne ... Archontis Politis
IEEE Journal of Selected Topics in Signal Processing | VOL. 13
Sharath Adavanne, et. al.Sharath Adavanne ... Archontis Politis
17 Dec 2018
IEEE Journal of Selected Topics in Signal Processing | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network for Sound Event Detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access