Abstract
In social settings, speech waveforms from nearby speakers mix together in our ear canals. Normally, the brain unmixes the attended speech stream from the chorus of background speakers using a combination of fast temporal processing and cognitive active listening mechanisms. Of >100,000 patient records,~10% of adults visited our clinic because of reduced hearing, only to learn that their hearing was clinically normal and should not cause communication difficulties. We found that multi-talker speech intelligibility thresholds varied widely in normal hearing adults, but could be predicted from neural phase-locking to frequency modulation (FM) cues measured with ear canal EEG recordings. Combining neural temporal fine structure processing, pupil-indexed listening effort, and behavioral FM thresholds accounted for 78% of the variability in multi-talker speech intelligibility. The disordered bottom-up and top-down markers of poor multi-talker speech perception identified here could inform the design of next-generation clinical tests for hidden hearing disorders.
Highlights
Slow fluctuations in the sound pressure envelope are sufficient for accurate speech perception in quiet backgrounds (Shannon et al, 1995)
To test the hypothesis that early neural processing of sTFS cues in the frequency modulation (FM) tone is associated with superior speech-in-noise processing, we developed a non-invasive physiological measure of neural sTFS phase-locking at early stages of auditory processing
We examined whether poor speech-in-noise intelligibility was associated with poor auditory nerve integrity, poor encoding of monaural sTFS cues, generally poor fast temporal processing and increased utilization of cognitive resources related to effortful listening
Summary
Slow fluctuations in the sound pressure envelope are sufficient for accurate speech perception in quiet backgrounds (Shannon et al, 1995). Envelope cues are less useful when speech is embedded in fluctuant backgrounds comprised of multiple talkers, environmental noise or reverberation (Zeng et al, 2005). Under these conditions, segregating a target speech stream from background noise requires accurate encoding of low-frequency spectral and binaural cues contained in the stimulus temporal fine structure (sTFS) (Hopkins and Moore, 2009; Lorenzi et al, 2006).
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have