Auditory Models and Human Performance in Tasks Related to Speech Coding and Speech Recognition

Oded Ghitza

doi:10.1007/978-1-4615-2281-2_17

Abstract

Auditory models that are capable of achieving human performance in tasks related to speech perception would provide a basis for realizing effective speech processing systems. Saving bits in speech coders, for example, relies on a perceptual tolerance to acoustic deviations from the original speech. Perceptual invariance to adverse signal conditions (noise, microphone and channel distortions, room reverberations) and to phonemic variability (due to nonuniqueness of articulatory gestures) may provide a basis for robust speech recognition. A state-of-the-art auditory model that simulates, in considerable detail, the outer parts of the auditory periphery up through the auditory nerve level is described. Speech information is extracted from the simulated auditory nerve firings, and used in place of the conventional input to several speech coding and recognition systems. The performance of these systems improves as a result of this replacement, but is still short of achieving human performance. The shortcomings occur, in particular, in tasks related to low bit-rate coding and to speech recognition. Since schemes for low bit-rate coding rely on signal manipulations that spread over durations of several tens of ms, and since schemes for speech recognition rely on phonemic/articulatory information that extend over similar time intervals, it is concluded that the shortcomings are due mainly to perceptually related rules over durations of 50-100 ms. These observations suggest a need for a study aimed at understanding how auditory nerve activity is integrated over time intervals of that duration. The author discusses preliminary experimental results that confirm human usage of such integration, with different integration rules for different time-frequency regions depending on the phoneme-discrimination task. >

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Auditory Models and Human Performance in Tasks Related to Speech Coding and Speech Recognition

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Combined speech enhancement and auditory modelling for robust distributed speech recognition
Ronan Flynn ... Edward Jones
Speech Communication | VOL. 50
Ronan Flynn, et. al.Ronan Flynn ... Edward Jones
20 May 2008
Speech Communication | VOL. 50

Neural Presbyacusis in Humans Inferred from Age-Related Differences in Auditory Nerve Function and Structure.
Kelly C Harris ... Carolyn M Mcclaskey
The Journal of Neuroscience | VOL. 41
Kelly C Harris, et. al.Kelly C Harris ... Carolyn M Mcclaskey
09 Nov 2021
The Journal of Neuroscience | VOL. 41

Modern Methods of Speech Processing
Ravi P Ramachandran ... Richard J Mammone
-
Ravi P Ramachandran, et. al.Ravi P Ramachandran ... Richard J Mammone
01 Jan 1995
01 Jan 1995

The representation of speech in a nonlinear auditory model: time-domain analysis of simulated auditory-nerve firing patterns
Guy J. Brown ... Nicholas R. Clark
-
Guy J. Brown, et. al.Guy J. Brown ... Nicholas R. Clark
27 Aug 2011
27 Aug 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Auditory Models and Human Performance in Tasks Related to Speech Coding and Speech Recognition

Abstract

Talk to us

Similar Papers