Investigating the role of specific facial information in audio-visual speech perception

P. M. T. Smeele,Erica B. Stevens,Patricia K. Kuhl,Andrew N. Meltzoff,Lisa D. Hahnlen

doi:10.1121/1.413906

P. M. T. Smeele, Erica B. Stevens + Show 3 more

Open Access

https://doi.org/10.1121/1.413906

Copy DOI

Abstract

When hearing and seeing a person speak, people receive both auditory and visual speech information. The contribution made by visual speech information has been demonstrated in a wide variety of conditions, most clearly when conflicting auditory and visual information is presented. In this study an investigation was performed to determine which aspects of the face most strongly influence audio-visual speech perception. The visual stimulus was manipulated using special effects techniques to isolate three specific ‘‘articulatory parts:’’ lips only, oral cavity only, or jaw only. These ‘‘parts’’ and their combinations were dubbed with auditory tokens to create ‘‘fusion’’ stimuli (A/aba/ + V/aga/) and ‘‘combination’’ stimuli (A/aga/ + V/aba/). Results indicated that visual information from jaw-only movements was not sufficient to induce illusory effects. However, for the combination condition, seeing moving lips or the inside of the speaker’s mouth produced substantial audio-visual effects. Additional visual information from other articulators did not significantly increase the effect. In the fusion situation, both the lips and oral cavity were necessary to obtain illusory responses, whereas individually they produced very few. The results suggest that visual information from the lips and oral cavity together are sufficient to influence auditory speech processing. [Research supported by NICHD.]

Full Text