Abstract
Algorithms for the automatic detection and recognition of acoustic events are increasingly gaining relevance for the reliable and robust functioning of consumer, assistive and monitoring systems. The extraction of appropriate task relevant acoustic features from the raw sound signal clearly influences performance of subsequent statistical classification, in particular in adverse acoustic situations. The present contribution investigates the use of biologically-inspired features, derived from a filter-bank of two-dimensional Gabor functions, that decompose the spectro-temporal power density into components which capture spectral, temporal and joint spectro-temporal modulation patterns. It is hypothesized that the comparably large joint spectral and temporal extent of these Gabor functions results in features that allow for robust classification. Evaluation of the proposed feature extraction scheme together with an hidden Markov model (HMM) classifier is conducted on two corpora comprising acoustic events in realistic adverse conditions from the D-CASE and CLEAR'07 evaluation campaigns. Relevance of each Gabor filter for classification is analyzed and an optimized parameter set for the Gabor filterbank (GFB) is identified. Performance of the optimized GFB is evaluated in comparison to other state-of-the-art algorithms on isolated event classification and on the full acoustic event detection (AED) including joint classification and temporal segmentation of events. Results show that Gabor features result in a signal representation that exhibits separated average class-specific patterns. An improvement in classification accuracy of up to 26% relative to the Mel-frequency cepstral coefficient (MFCC) baseline is obtained with the optimized GFB. Further experiments demonstrate that this improvement cannot be explained by purely temporal or purely spectral Gabor basis functions. Rather, a GFB with features extending in joint spectro-temporal directions is required to obtain optimum performance. Performance on AED with the D-CASE challenge dataset is shown to improve on previous algorithms from the recent literature.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.