Spectro-Temporal Gabor Filterbank Features for Acoustic Event Detection

Jens Schroder,Stefan Goetze,Jorn Anemuller

doi:10.1109/taslp.2015.2467964

Abstract

Algorithms for the automatic detection and recognition of acoustic events are increasingly gaining relevance for the reliable and robust functioning of consumer, assistive and monitoring systems. The extraction of appropriate task relevant acoustic features from the raw sound signal clearly influences performance of subsequent statistical classification, in particular in adverse acoustic situations. The present contribution investigates the use of biologically-inspired features, derived from a filter-bank of two-dimensional Gabor functions, that decompose the spectro-temporal power density into components which capture spectral, temporal and joint spectro-temporal modulation patterns. It is hypothesized that the comparably large joint spectral and temporal extent of these Gabor functions results in features that allow for robust classification. Evaluation of the proposed feature extraction scheme together with an hidden Markov model (HMM) classifier is conducted on two corpora comprising acoustic events in realistic adverse conditions from the D-CASE and CLEAR'07 evaluation campaigns. Relevance of each Gabor filter for classification is analyzed and an optimized parameter set for the Gabor filterbank (GFB) is identified. Performance of the optimized GFB is evaluated in comparison to other state-of-the-art algorithms on isolated event classification and on the full acoustic event detection (AED) including joint classification and temporal segmentation of events. Results show that Gabor features result in a signal representation that exhibits separated average class-specific patterns. An improvement in classification accuracy of up to 26% relative to the Mel-frequency cepstral coefficient (MFCC) baseline is obtained with the optimized GFB. Further experiments demonstrate that this improvement cannot be explained by purely temporal or purely spectral Gabor basis functions. Rather, a GFB with features extending in joint spectro-temporal directions is required to obtain optimum performance. Performance on AED with the D-CASE challenge dataset is shown to improve on previous algorithms from the recent literature.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Spectro-Temporal Gabor Filterbank Features for Acoustic Event Detection

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Dec 1, 2015
Citations: 65

Similar Papers

Detection of Acoustic Events by using MFCC and Spectro-Temporal Gabor Filterbank Features
Umair Zafar Khan ... Arslan Shaukat
-
Umair Zafar Khan, et. al.Umair Zafar Khan ... Arslan Shaukat
21 Nov 2016
21 Nov 2016

Improving acoustic event detection using generalizable visual features and multi-modality modeling
Po-Sen Huang ... Mark Hasegawa-Johnson
-
Po-Sen Huang, et. al.Po-Sen Huang ... Mark Hasegawa-Johnson
01 May 2011
01 May 2011

Greedy regression and differential convex-based deep learning for audio event classification
J Sangeetha ... M Priyanka
Journal of Intelligent & Fuzzy Systems | VOL. -
J Sangeetha, et. al.J Sangeetha ... M Priyanka
11 Oct 2023
Journal of Intelligent & Fuzzy Systems | VOL. -

Creating a new research community on detection and classification of acoustic scenes and events: Lessons from the first ten years of DCASE challenges and workshops
Mark Plumbley ... Tuomas Virtanen
INTER-NOISE and NOISE-CON Congress and Conference Proceedings | VOL. 265
Mark Plumbley, et. al.Mark Plumbley ... Tuomas Virtanen
01 Feb 2023
INTER-NOISE and NOISE-CON Congress and Conference Proceedings | VOL. 265

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Spectro-Temporal Gabor Filterbank Features for Acoustic Event Detection

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing