Low-fidelity visual cues facilitate speech processing in a cocktail party scenario

Margaret H Ugolini,Brian D Simpson,Eric R Thompson,Zachariah N Ennis

doi:10.1121/1.5147697

Abstract

Speech comprehension is enhanced when a talker's face is visible, and amplitude modulations in speech support intelligibility. Listeners may benefit from visual speech by extracting amplitude modulation cues, which are represented by the mouth aperture of the talker. This multimodal enhancement of speech is often desirable, but the visual presentation of a talker's face is not always feasible. The present study investigated the degree to which a “low-fidelity” amplitude-modulation cue – an LED that changed in luminance with the amplitude of the speech envelope – contributes to speech perception for a target signal (a phrase from the Coordinate Response Measure, CRM) presented with two competing CRM speech phrases. Each trial consisted of 3 simultaneous speech streams of 5 sequential CRM phrases each. One stream included a target phrase (defined by a preset call sign) and originated from a location directly in front of the listener; competing sequences were placed at + /- 10 deg relative to that location. Listeners responded with the color and number associated with the target call sign. The presence of amplitude modulation cues and target timing cues enhanced performance. Further effects of cue type by signal-to-noise ratio will be discussed, as well as applications and future work.

Full Text