Sound Event Detection Using Derivative Features in Deep Neural Networks

Jin-Yeol Kwak,Yong-Joo Chung

doi:10.3390/app10144911

Abstract

We propose using derivative features for sound event detection based on deep neural networks. As input to the networks, we used log-mel-filterbank and its first and second derivative features for each frame of the audio signal. Two deep neural networks were used to evaluate the effectiveness of these derivative features. Specifically, a convolutional recurrent neural network (CRNN) was constructed by combining a convolutional neural network and a recurrent neural networks (RNN) followed by a feed-forward neural network (FNN) acting as a classification layer. In addition, a mean-teacher model based on an attention CRNN was used. Both models had an average pooling layer at the output so that weakly labeled and unlabeled audio data may be used during model training. Under the various training conditions, depending on the neural network architecture and training set, the use of derivative features resulted in a consistent performance improvement by using the derivative features. Experiments on audio data from the Detection and Classification of Acoustic Scenes and Events 2018 and 2019 challenges indicated that a maximum relative improvement of 16.9% was obtained in terms of the F-score.

Highlights

Humans can obtain information about their surroundings from nearby sounds
The classification results on the DCASE 2018 test set are shown in Table 3, where “Single channel” implies that only the static log-mel filterbank was used as the input of the basic convolutional recurrent neural network (CRNN), and “Three channels” implies that derivative features were used as the input
We proposed the use of the first and second delta features of the log-mel filterbank to improve the performance of state-of-the-art CRNNs

Summary

Introduction

Humans can obtain information about their surroundings from nearby sounds. sound signal analysis, whereby information may be automatically extracted from audio data, has attracted considerable attention. The recently proposed convolutional recurrent neural networks (CRNNs), which combine CNNs and RNNs, have exhibited satisfactory classification performance in SED [11] They are currently recognized as a highly effective deep neural network architecture in SED and have been widely used in the DCASE challenge since 2018. The attention method was effective in identifying sound events from audio recordings, including noisy sounds Owing to their availability, unlabeled data are critical for improved SED. In [20], for efficient use of unlabeled training data, a mean-teacher model based on an attention-based CRNN was proposed for SED and it showed the best performance in DCASE challenge 2018. Deep neural networks for SED have evolved from simple FNNs to recent CRNNs, where an attention-based architecture as well as mean-teacher model-based training and evaluation are used.

Preprocessing

Derivative

Network Architecture

Basic CRNN

Mean-Teacher Model

Structure

Database

Evaluation Metrics

Experimental Results

Learning

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Jul 17, 2020
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Sound Event Detection Using Derivative Features in Deep Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Research on Semi-Supervised Sound Event Detection Based on Mean Teacher Models Using ML-LoBCoD-NET
Jinjia Wang ... Jing Xia
IEEE Access | VOL. 8
Jinjia Wang, et. al.Jinjia Wang ... Jing Xia
01 Jan 2020
IEEE Access | VOL. 8

Sound event detection using deep neural networks
Suk-Hwan Jung ... Yong-Joo Chung
TELKOMNIKA (Telecommunication Computing Electronics and Control) | VOL. 18
Suk-Hwan Jung, et. al.Suk-Hwan Jung ... Yong-Joo Chung
01 Oct 2020
TELKOMNIKA (Telecommunication Computing Electronics and Control) | VOL. 18

Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization
Qiuqiang Kong ... Yong Xu
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28
Qiuqiang Kong, et. al.Qiuqiang Kong ... Yong Xu
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28

Joint Analysis of Sound Events and Acoustic Scenes Using Multitask Learning
Noriyuki Tonami ... Keisuke Imoto
IEICE Transactions on Information and Systems | VOL. E104.D
Noriyuki Tonami, et. al.Noriyuki Tonami ... Keisuke Imoto
16 Oct 2020
IEICE Transactions on Information and Systems | VOL. E104.D

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sound Event Detection Using Derivative Features in Deep Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences