Gammatone Filterbank Research Articles

Adverse noisy conditions pose great challenges to automatic speech applications including speaker and language identification (SID and LID), where mel-frequency cepstral coefficients (MFCC) are the most commonly adopted acoustic features. Although systems trained using MFCCs provide competitive performance under matched conditions, it is well-known that such systems are susceptible to acoustic mismatch between training and test conditions due to noise and channel degradations. Motivated by this fact, this study proposes an alternative noise-robust acoustic feature front-end that is capable of capturing speaker identity as well as language structure/content conveyed in the speech signal. Specifically, a feature extraction procedure inspired by the human auditory processing is proposed. The proposed feature is based on the Hilbert envelope of Gammatone filterbank outputs that represent the envelope of the auditory nerve response. The subband amplitude modulations, which are captured through smoothed Hilbert envelopes (a.k.a. temporal envelopes), carry useful acoustic information and have been shown to be robust to signal degradations. Effectiveness of the proposed front-end, which is entitled mean Hilbert envelope coefficients (MHEC), is evaluated in the context of SID and LID tasks using degraded speech material from the DARPA Robust Automatic Transcription of Speech (RATS) program. In addition, we investigate the impact of the dynamic range compression stage in the MHEC feature extraction process on performance using logarithmic and power-law nonlinearities. Experimental results indicate that: (i) the MHEC feature is highly effective and performs favorably compared to other conventional and state-of-the-art front-ends, and (ii) the power-law non-linearity consistently yields the best performance across different conditions for both SID and LID tasks.

Read full abstract

Interaural Level Difference (ILD) provides an important cue for the location of a sound source in the azimuthal plane. Typically, ILD decoding in the brainstem is modeled as a subtraction of spike rates, with inhibitory inputs from one ear subtracted from the excitatory inputs from the other [1-3]. The inferior colliculus (IC) is known to receive input from this circuit, and to encode the spatial location of sounds. Recent experimental evidence suggests that inhibitory input for ILD doesn’t provide subtraction, but instead provides a gain adjustment [4]. In addition, the exact mechanism of the creation of spatial receptive fields in the inferior colliculus remains unclear, and may also be a gain mechanism [5]. The excitatory input to the IC from neurons that decode ILD may contain spike timing cues for location. These spike-timing cues may be initiated in the ILD encoding cells even if the cues are absent from the inputs from the cochlear nucleus. In this study we used a spiking neuron model to recreate and model the full circuit of ILD sensitivity, and explore both the issue of ILD decoding, and the representation of sound source location in the IC. The auditory periphery was modeled as a gammatone filterbank which provided inputs directly to a leaky integrate-and-fire model representing the cells of the cochlear nucleus. These cells are known to lock to the envelope of the sound stimulus, and this behavior was recreated by low-pass filtering of the gammatone filterbank inputs to the cells, and use of a dynamic spike threshold mechanism [6]. The ILD sensitive cells and IC cells were both modeled as simple leaky integrate-and-fire neurons. The model was able to recreate important experimental results regarding ILD encoding cells, particularly the variation of sensitivity with source intensity, and successfully created spatial receptive fields like those found in the IC. The results will be helpful in the future understanding of the binaural mechanisms of the auditory brainstem.

Read full abstract

Gammatone Filterbank Research Articles

Articles published on Gammatone Filterbank

Analysis of acoustic features for speech intelligibility prediction models analysis of acoustic features for speech intelligibility prediction models

Human-inspired modulation frequency features for noise-robust ASR

A Study of Musical Pitch Distance Using a Self-Organized Hierarchical Linear Dynamical System on Acoustic Signals

Speech enhancement of instantaneous amplitude and phase for applications in noisy reverberant environments

A compact digital gamma-tone filter processor

Attention selectively modulates cortical entrainment in different regions of the speech spectrum

Neuro-steered noise suppression for auditory prostheses

Improving neural decoding in the central auditory system using bio-inspired spectro-temporal representations and a generalized bilinear model.

Perceptually Accurate Reproduction of Recorded Sound Fields in a Reverberant Room Using Spatially Distributed Loudspeakers

Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification

Acoustic feature extraction method for robust speaker identification

Vowel context effects on the spectral dynamics of English and Japanese sibilant fricatives

Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation

Optimizing pulse-spreading harmonic complexes to minimize intrinsic modulations after auditory filtering.

Spiking models of interaural level difference encoding - beyond the rate subtraction code

The auditory image model and me

Cochleagram-based audio pattern separation using two-dimensional non-negative matrix factorization with automatic sparsity adaptation

Speech enhancement based on perceptual filter bank improvement

A Comparison of Spectro-Temporal Representations of Audio Signals

Underwater acoustic target classification and auditory feature identification based on dissimilarity evaluation

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Gammatone Filterbank Research Articles

Articles published on Gammatone Filterbank

Analysis of acoustic features for speech intelligibility prediction models analysis of acoustic features for speech intelligibility prediction models

Human-inspired modulation frequency features for noise-robust ASR

A Study of Musical Pitch Distance Using a Self-Organized Hierarchical Linear Dynamical System on Acoustic Signals

Speech enhancement of instantaneous amplitude and phase for applications in noisy reverberant environments

A compact digital gamma-tone filter processor

Attention selectively modulates cortical entrainment in different regions of the speech spectrum

Neuro-steered noise suppression for auditory prostheses

Improving neural decoding in the central auditory system using bio-inspired spectro-temporal representations and a generalized bilinear model.

Perceptually Accurate Reproduction of Recorded Sound Fields in a Reverberant Room Using Spatially Distributed Loudspeakers

Mean Hilbert envelope coefficients (MHEC) for robust speaker and language identification

Acoustic feature extraction method for robust speaker identification

Vowel context effects on the spectral dynamics of English and Japanese sibilant fricatives

Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation

Optimizing pulse-spreading harmonic complexes to minimize intrinsic modulations after auditory filtering.

Spiking models of interaural level difference encoding - beyond the rate subtraction code

The auditory image model and me

Cochleagram-based audio pattern separation using two-dimensional non-negative matrix factorization with automatic sparsity adaptation

Speech enhancement based on perceptual filter bank improvement

A Comparison of Spectro-Temporal Representations of Audio Signals

Underwater acoustic target classification and auditory feature identification based on dissimilarity evaluation