An automatic speech recognition system using time-delays self-organizing maps with physiological parametric extraction

Jose M Ferrandez,Victoria Rodellar,Daniel Del Valle,Pedro Gomez

doi:10.1121/1.420768

Abstract

Physiological parametric extraction uses auditory models as a front end for speech recognition. These last methods assume that if speech signals are coded in the same way that the auditory system does, speech could be later identified showing the main properties that biological systems do: robustness and accuracy. The proposed system consists of a cochlear model implemented by gammatone filterbanks as proposed by Patterson [J. Acoust. Soc. Am. 96, 1409–1418 (1994)]. This stage will feed a nonlinear mechanical-to-neural transduction module based on the Meddis hair-cell model [J. Acoust. Soc. Am. 79, 702–711 (1986)], which will compute auditory-nerve firings. Finally, a temporal integration/component–extraction module will integrate neural patterns for identifying the relevant components embedded in the speech signals [characteristic frequency (CF), frequency modulation (FM) and noise burst (NB)], which are shared by human speech and animal sounds for communication. The model adopts a spatiotemporal strategy, which uses temporal information in low CF fibers (phase-locking mechanism) and spatial information for the higher ones. The recognizing module consists in a time-delay self-organizing map, which will capture not only the spectral variability contained in the signal, but also the temporal one, providing better generalization properties. [Work supported by NATO CRG-960053.]

Full Text