Auditory-visual speech perception examined by functional MRI and reaction time

Kaoru Sekiyama,Yoichi Sugita

doi:10.1121/1.4777337

Abstract

In face-to-face communication, auditory and visual speech cues are easily integrated as demonstrated by the McGurk effect. The integration processes are investigated by measuring brain activity (fMRI) and reaction time. The subjects were 10 native speakers of Japanese. The stimuli were /ba/, /da/, and /ga/ uttered by three female talkers. The audiovisual stimuli were the McGurk type stimuli consisting of discrepant auditory and visual syllables (e.g., audio /ba/ was combined with video /da/ or /ga/). We compared brain activity during audiovisual speech perception for two sets of conditions differing in the intelligibility of auditory component of speech. In each condition, the subjects were asked to identify spoken syllables. When the auditory speech was intelligible, a brain area for visual motion processing was quiet, whereas the same visual area was active when speech was harder to hear. Thus visual information of the mouth movements was processed more intensively when speech was harder to hear. Reaction time was faster for the low-intelligibility condition than for the high-intelligibility condition, suggesting a top-down suppression of visual processing when auditory speech was intelligible. The integration processes seem to involve a process to actively find the optimal weight of different modalities under given circumstances.

Full Text