Abstract
The ability to navigate "cocktail party" situations by focusing on sounds of interest over irrelevant, background sounds is often considered in terms of cortical mechanisms. However, subcortical circuits such as the pathway underlying the medial olivocochlear (MOC) reflex modulate the activity of the inner ear itself, supporting the extraction of salient features from auditory scene prior to any cortical processing. To understand the contribution of auditory subcortical nuclei and the cochlea in complex listening tasks, we made physiological recordings along the auditory pathway while listeners engaged in detecting non(sense) words in lists of words. Both naturally spoken and intrinsically noisy, vocoded speech-filtering that mimics processing by a cochlear implant (CI)-significantly activated the MOC reflex, but this was not the case for speech in background noise, which more engaged midbrain and cortical resources. A model of the initial stages of auditory processing reproduced specific effects of each form of speech degradation, providing a rationale for goal-directed gating of the MOC reflex based on enhancing the representation of the energy envelope of the acoustic waveform. Our data reveal the coexistence of 2 strategies in the auditory system that may facilitate speech understanding in situations where the signal is either intrinsically degraded or masked by extrinsic acoustic energy. Whereas intrinsically degraded streams recruit the MOC reflex to improve representation of speech cues peripherally, extrinsically masked streams rely more on higher auditory centres to denoise signals.
Highlights
Robust cocktail party listening, the ability to focus on a single talker in a background of simultaneous, overlapping conversations, is critical to human communication and a long-sought goal of hearing technologies [1,2]
We assessed speech intelligibility— the ability to discriminate between AustralianEnglish words and non-words—when speech was degraded by 3 different manipulations: noise vocoding the entire speech waveform; adding 8-talker babble noise (BN) to “natural” speech in quiet; or adding speech-shaped noise (SSN) to “natural” speech
We examined the effects of efferent feedback on the encoding of temporal fine structure (TFS, the instantaneous pressure waveform of a sound)—a stimulus cue at low frequencies associated with speech understanding [50,88,89,90,91,92]—and observed a small, mean improvement in the model with the medial olivocochlear (MOC) reflex at low frequencies (
Summary
The ability to focus on a single talker in a background of simultaneous, overlapping conversations, is critical to human communication and a long-sought goal of hearing technologies [1,2]. Problems listening in background noise are a key complaint of many listeners with even mild hearing loss and a stated factor in the non-use and non-uptake of hearing.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have