Vocal and semantic cues for the segregation of long concurrent speech stimuli in diotic and dichotic listening—The Long-SWoRD test

Moïra-Phoebé Huet,Christophe Micheyl,Etienne Gaudrain,Etienne Parizet

doi:10.1121/10.0007225

Abstract

It is not always easy to follow a conversation in a noisy environment. To distinguish between two speakers, a listener must mobilize many perceptual and cognitive processes to maintain attention on a target voice and avoid shifting attention to the background noise. The development of an intelligibility task with long stimuli-the Long-SWoRD test-is introduced. This protocol allows participants to fully benefit from the cognitive resources, such as semantic knowledge, to separate two talkers in a realistic listening environment. Moreover, this task also provides the experimenters with a means to infer fluctuations in auditory selective attention. Two experiments document the performance of normal-hearing listeners in situations where the perceptual separability of the competing voices ranges from easy to hard using a combination of voice and binaural cues. The results show a strong effect of voice differences when the voices are presented diotically. In addition, analyzing the influence of the semantic context on the pattern of responses indicates that the semantic information induces a response bias in situations where the competing voices are distinguishable and indistinguishable from one another.

Highlights

Two speech streams that correspond to two different voices, the cortical responses measured using intracortical recordings, electroencephalography (EEG), or magnetoencephalography (MEG) follow more strongly the temporal envelope of the attended stream than that of the unattended stream (Ding and Simon, 2012; Mesgarani and Chang, 2012; O’Sullivan et al, 2015)
The participants had better scores when the distance between the voices increased 1⁄2b 1⁄4 2:31; standard error (SE) 1⁄4 0:15; z 1⁄4 15:72; p < 0:001, as well as when the local to-masker ratio (TMR) was higher 1⁄2b 1⁄4 3:50; SE 1⁄4 0:38; z 1⁄4 9:22; p < 0:001. The interaction between these two factors, was not significant 1⁄2b 1⁄4 0:94; SE 1⁄4 1:21; z 1⁄4 0:77; p 1⁄4 0:44. The results of this additional analysis indicate that the local TMR does influence the participants’ errors but only when the stimuli are presented diotically
In the dichotic condition, it appears that the binaural cue is so strong that the small local TMR fluctuations are irrelevant

Summary

Introduction

Two speech streams that correspond to two different voices, the cortical responses measured using intracortical recordings, electroencephalography (EEG), or magnetoencephalography (MEG) follow more strongly the temporal envelope of the attended stream than that of the unattended stream (Ding and Simon, 2012; Mesgarani and Chang, 2012; O’Sullivan et al, 2015). Informal reports from others in addition to our own experiences as participants in tasks using such long stimuli, strongly suggest that this constantattention assumption may not be warranted Rather, it appears that for many listeners, maintaining one’s undivided attention on a single speech stream for several tens of seconds (such as listening to a voice telling a story) while another speech stream is being played concurrently at approximately the same sound level places demands on the listener’s focus that can lead to attentional shifts from the target to the nontarget voice. Whereas verbal working memory has been shown to play a role in speech-on-speech perception (see Besser et al, 2013, for a review), little is known about the effect that long, coherent stimuli may have on the working memory, stimuli that more closely resemble a real-life communication situation vs short, isolated sentences. The working memory could run out of capacity or else the presence of continuous interfering speech could create challenges in storing information in memory

Objectives

Methods

Results

Conclusion