Speech recognition with minimal spectral cues

Robert V. Shannon,Vivek Kamath,Fan-Gang Zeng,Micheal Ekelid,John Wygonski

doi:10.1121/1.409411

Abstract

Speech recognition was measured in conditions that systematically reduced the amount of spectral information while preserving temporal envelope information. Speech stimuli were spectrally separated into several frequency bands. The temporal envelope in each band was extracted by half-wave rectification followed by low-pass filtering. Each resulting envelope was then used to modulate a noise band with the same bandwidth and cut-off frequencies as the original analysis band. Identification of consonants (16 consonants in aCa context), vowels (8 vowels in hVd context), and words in simple sentences (CUNY sentences) was measured as a function of the number and frequency distribution of analysis bands, the envelope filter cut-off frequency, and overall spectral shaping. Results as a function of the number of channels show that consonant recognition improves from one to two channels, but less improvement is observed from two to four channels. Vowel recognition improves significantly from one to three channels. Sentence recognition improves with the number of channels, approaching 100% correct with four channels. These results indicate that relatively little spectral detail is sufficient for recognition of speech. Results will be discussed in terms of speech processing strategies for cochlear implants. [Work supported by NIDCD.]

Full Text