Continuous robust sound event classification using time-frequency features and deep learning

Juan A Añel,Ian Mcloughlin,Yan Song,Haomin Zhang,Zhipeng Xie,Huy Phan,Wei Xiao

doi:10.1371/journal.pone.0182309

Abstract

The automatic detection and recognition of sound events by computers is a requirement for a number of emerging sensing and human computer interaction technologies. Recent advances in this field have been achieved by machine learning classifiers working in conjunction with time-frequency feature representations. This combination has achieved excellent accuracy for classification of discrete sounds. The ability to recognise sounds under real-world noisy conditions, called robust sound event classification, is an especially challenging task that has attracted recent research attention. Another aspect of real-word conditions is the classification of continuous, occluded or overlapping sounds, rather than classification of short isolated sound recordings. This paper addresses the classification of noise-corrupted, occluded, overlapped, continuous sound recordings. It first proposes a standard evaluation task for such sounds based upon a common existing method for evaluating isolated sound classification. It then benchmarks several high performing isolated sound classifiers to operate with continuous sound data by incorporating an energy-based event detection front end. Results are reported for each tested system using the new task, to provide the first analysis of their performance for continuous sound event detection. In addition it proposes and evaluates a novel Bayesian-inspired front end for the segmentation and detection of continuous sound recordings prior to classification.

Highlights

Sound event classification requires a trained system, when presented with an unknown sound, to correctly identify the class of that sound
We constructed an L-layer deep neural network (DNN) with the input fed from the chosen feature vectors (e.g. spectrogram image feature (SIF), shown in Fig 3) and the output layer in a one-of-K configuration The DNN begins with a number of individually pre-trained restricted Boltzmann machine (RBM) pairs, each of which have V visible input nodes and H hidden stochastic nodes, v = [v1: vV]>, and h = [h1: hH]> which are stacked to form a deep network
The automatic speech recognition (ASR)-inspired mel-frequency cepstral coefficients (MFCCs)-hidden Markov model (HMM) method is the least noise-robust method, while SIF-Convolutional neural networks (CNNs) appears most capable for the 20 dB and 10 dB conditions, which are likely to encompass the main range of realistic deployment scenarios, while SVM maintains a slight advantage in the highly noisy 0 dB environment

Summary

Introduction

Sound event classification requires a trained system, when presented with an unknown sound, to correctly identify the class of that sound. This paper adapts several state-of-the-art machine hearing methods into the classification of continuous, noise-corrupted and occluded sounds It defines a first standardised evaluation method for such sounds, based on the commonly-used robust sound event classification evaluation task from [10,11,12,13,14,15] into a test that includes all three aspects of real-world performance; noise robustness, occlusion/overlap and event occurrence detection. We begin with previously published isolated event classifiers that have demonstrated good performance as our baseline, namely MFCC with HMM [13], SIF with SVM [14], SIF with DNN [14] and SIF with CNN [15] These will all be evaluated with an energy-based sound event detector front end which will be discussed below. Feature vector v comprises elements v(i); vðiÞ 1⁄4 Sðbi=Bc; i À Bbi=BcÞÞ for i 1⁄4 0 . . . BD À 1

DÀ 1 X BÀ 1

Background probability scaling and thresholding

Results and discussion

Conclusion and future work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Sep 11, 2017
Citations: 26	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Continuous robust sound event classification using time-frequency features and deep learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

Recognition of marine sound events using an echo state reservoir and a K-nearest neighbor classifier
Cristian E Graupe ... Lora J Van Uffelen
The Journal of the Acoustical Society of America | VOL. 146
Cristian E Graupe, et. al.Cristian E Graupe ... Lora J Van Uffelen
01 Oct 2019
The Journal of the Acoustical Society of America | VOL. 146

Context-dependent sound event detection
Toni Heittola ... Antti Eronen
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2013
Toni Heittola, et. al.Toni Heittola ... Antti Eronen
09 Jan 2013
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2013

Sound Event Classification and Detection with Weakly Labeled Data
Sharath Adavanne ... Vladimir Tourbabin
-
Sharath Adavanne, et. al.Sharath Adavanne ... Vladimir Tourbabin
01 Jan 2019
01 Jan 2019

A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
Thi Ngoc Tho Nguyen ... Douglas L Jones
-
Thi Ngoc Tho Nguyen, et. al.Thi Ngoc Tho Nguyen ... Douglas L Jones
01 May 2020
01 May 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Continuous robust sound event classification using time-frequency features and deep learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE