Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection.

Ashwin Bellur,Mounya Elhilali

doi:10.1109/taslp.2016.2639322

Abstract

Parsing natural acoustic scenes using computational methodologies poses many challenges. Given the rich and complex nature of the acoustic environment, data mismatch between train and test conditions is a major hurdle in data-driven audio processing systems. In contrast, the brain exhibits a remarkable ability at segmenting acoustic scenes with relative ease. When tackling challenging listening conditions that are often faced in everyday life, the biological system relies on a number of principles that allow it to effortlessly parse its rich soundscape. In the current study, we leverage a key principle employed by the auditory system: its ability to adapt the neural representation of its sensory input in a high-dimensional space. We propose a framework that mimics this process in a computational model for robust speech activity detection. The system employs a 2-D Gabor filter bank whose parameters are retuned offline to improve the separability between the feature representation of speech and nonspeech sounds. This retuning process, driven by feedback from statistical models of speech and nonspeech classes, attempts to minimize the misclassification risk of mismatched data, with respect to the original statistical models. We hypothesize that this risk minimization procedure results in an emphasis of unique speech and nonspeech modulations in the high-dimensional space. We show that such an adapted system is indeed robust to other novel conditions, with a marked reduction in equal error rates for a variety of databases with additive and convolutive noise distortions. We discuss the lessons learned from biology with regard to adapting to an ever-changing acoustic environment and the impact on building truly intelligent audio processing systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection.

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing

Lead the way for us

Journal: IEEE/ACM transactions on audio, speech, and language processing	Publication Date: Dec 13, 2016
Citations: 53

Similar Papers

Word confidence calibration using a maximum entropy model with constraints on confidence and word distributions
Dong Yu ... Jinyu Li
-
Dong Yu, et. al.Dong Yu ... Jinyu Li
01 Jan 2009
01 Jan 2009

Recurrent neural network and LSTM models for lexical utterance classification
Suman Ravuri ... Andreas Stolcke
-
Suman Ravuri, et. al.Suman Ravuri ... Andreas Stolcke
06 Sep 2015
06 Sep 2015

ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements
Héctor Delgado ... Md Sahidullah
-
Héctor Delgado, et. al.Héctor Delgado ... Md Sahidullah
26 Jun 2018
ASVspoof 2017 Version 2.0: meta-data analysis and baseline enhancements
Héctor Delgado ... Md Sahidullah

Increasing the Robustness of i-vectors with Model Compensated First Order Statistics
Gökay Di̇şken ... Zekeriya Tüfekci̇
Afyon Kocatepe University Journal of Sciences and Engineering | VOL. 23
Gökay Di̇şken, et. al.Gökay Di̇şken ... Zekeriya Tüfekci̇
01 Mar 2023
Afyon Kocatepe University Journal of Sciences and Engineering | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection.

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM transactions on audio, speech, and language processing