A theoretical model of cochlear processing improves spectrally degraded speech perception

Evan C Smith,Lori L Holt

doi:10.1121/1.4786001

Abstract

Smith and Lewicki, Neural Comp. 17, 19–45, 2005a; Adv. Neural. Inf. Process. Syst. 17, 1289–1296, 2005b; Nature, 439, 7079, 2006, demonstrated that mammalian hearing follows an efficient coding principle (Barlow, Sensory Communications, 217–234, 1961; Atick, Network, 3(2), 213–251, 1992; Simoncelli and Olshausen, Ann. Rev. Neurosci., 24, 1193–1216, 2001; Laughlin and Sejnowski, ‘‘Communications in Neuronal Networks,’’ Science, 301, 1870–1874 2003). Auditory neurons efficiently code for natural sounds in the environment, maximizing information rate while minimizing coding cost (Shannon, Science, 270, 303–304, 1948). Applying the same analysis to speech coding suggests that speech acoustics are optimally adapted to the mammalian auditory code (Smith and Lewicki, Neural Comp. 17, 19–45, 2005a; Adv. Neural. Inf. Process. Syst. 17, 1289–1296, 2005b; Nature, 439, 7079, 2006). The present work applies this efficient coding theory to the problem of speech perception in individuals using cochlear implants (CI), for which there exist vast individual differences in speech perception and spectral resolution (Zeng et al., Auditory Prostheses and Electric Hearing, 20, 1–14, 2004). A machine-learning method for CI filterbank design based on the efficient-coding hypothesis is presented. Further, a pair of experiments to evaluate this approach using noise-excited vocoder speech (Shannon et al., Bell Systems Technical Journal 27, 379–423, 623–656, 1995) is described. Participants’ recognition of continuous speech and isolated syllables is significantly more accurate for speech filtered through the theoretically-motivated efficient-coding filterbank relative to the standard cochleotopic filterbank, particularly for speech transients. These findings offer insight in CI design and provide behavioral evidence for efficient coding in human perception.

Full Text