Abstract

The automatic detection and recognition of sound events by computers is a requirement for a number of emerging sensing and human computer interaction technologies. Recent advances in this field have been achieved by machine learning classifiers working in conjunction with time-frequency feature representations. This combination has achieved excellent accuracy for classification of discrete sounds. The ability to recognise sounds under real-world noisy conditions, called robust sound event classification, is an especially challenging task that has attracted recent research attention. Another aspect of real-word conditions is the classification of continuous, occluded or overlapping sounds, rather than classification of short isolated sound recordings. This paper addresses the classification of noise-corrupted, occluded, overlapped, continuous sound recordings. It first proposes a standard evaluation task for such sounds based upon a common existing method for evaluating isolated sound classification. It then benchmarks several high performing isolated sound classifiers to operate with continuous sound data by incorporating an energy-based event detection front end. Results are reported for each tested system using the new task, to provide the first analysis of their performance for continuous sound event detection. In addition it proposes and evaluates a novel Bayesian-inspired front end for the segmentation and detection of continuous sound recordings prior to classification.

Highlights

  • Sound event classification requires a trained system, when presented with an unknown sound, to correctly identify the class of that sound

  • We constructed an L-layer deep neural network (DNN) with the input fed from the chosen feature vectors (e.g. spectrogram image feature (SIF), shown in Fig 3) and the output layer in a one-of-K configuration The DNN begins with a number of individually pre-trained restricted Boltzmann machine (RBM) pairs, each of which have V visible input nodes and H hidden stochastic nodes, v = [v1: vV]>, and h = [h1: hH]> which are stacked to form a deep network

  • The automatic speech recognition (ASR)-inspired mel-frequency cepstral coefficients (MFCCs)-hidden Markov model (HMM) method is the least noise-robust method, while SIF-Convolutional neural networks (CNNs) appears most capable for the 20 dB and 10 dB conditions, which are likely to encompass the main range of realistic deployment scenarios, while SVM maintains a slight advantage in the highly noisy 0 dB environment

Read more

Summary

Introduction

Sound event classification requires a trained system, when presented with an unknown sound, to correctly identify the class of that sound. This paper adapts several state-of-the-art machine hearing methods into the classification of continuous, noise-corrupted and occluded sounds It defines a first standardised evaluation method for such sounds, based on the commonly-used robust sound event classification evaluation task from [10,11,12,13,14,15] into a test that includes all three aspects of real-world performance; noise robustness, occlusion/overlap and event occurrence detection. We begin with previously published isolated event classifiers that have demonstrated good performance as our baseline, namely MFCC with HMM [13], SIF with SVM [14], SIF with DNN [14] and SIF with CNN [15] These will all be evaluated with an energy-based sound event detector front end which will be discussed below. Feature vector v comprises elements v(i); vðiÞ 1⁄4 Sðbi=Bc; i À Bbi=BcÞÞ for i 1⁄4 0 . . . BD À 1

DÀ 1 X BÀ 1
Background probability scaling and thresholding
Results and discussion
Conclusion and future work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.