Modeling Human Auditory Perception for Noise-Robust Speech Recognition

Soo-Young Lee Soo-Young Lee

doi:10.1109/icnnb.2005.1614867

Abstract

Several bio-inspired models of human auditory perception are reported for robust speech recognition in real-world noisy environment. The developed mathematical models of the human auditory pathway are integrated into a speech recognition system, of which 3 components are (1) the nonlinear feature extraction model from cochlea to auditory cortex, (2) the binaural processing model at superior olivery complex, and (3) the top-down attention model from higher brain to the cochlea. The unsupervised Independent Component Analysis shows that some auditory feature extraction and binaural processing mechanisms follow information theory with sparse representation. The ICA-based features resemble frequency-limited features extracted from the cochlea and also more complex time-frequency features from the inferior colliculus and auditory cortex. The top-down attention model shows how the pre-acquired knowledge in our brain filters out irrelevant features or fills in missing features in the sensory data. Both the top-down attention and bottom-up binaural processing are combined into a single system for high-noisy cases. This auditory model requires extensive computing, and several VLSI implementations had been developed for real-time applications. Experimental results demonstrate much better recognition performance in real-world noisy environments.

Full Text