Abstract

Audiovisual (AV) speech integration of auditory and visual streams generally ends up in a fusion into a single percept. One classical example is the McGurk effect in which incongruent auditory and visual speech signals may lead to a fused percept different from either visual or auditory inputs. In a previous set of experiments, we showed that if a McGurk stimulus is preceded by an incongruent AV context (composed of incongruent auditory and visual speech materials) the amount of McGurk fusion is largely decreased. We interpreted this result in the framework of a two-stage “binding and fusion” model of AV speech perception, with an early AV binding stage controlling the fusion/decision process and likely to produce “unbinding” with less fusion if the context is incoherent. In order to provide further electrophysiological evidence for this binding/unbinding stage, early auditory evoked N1/P2 responses were here compared during auditory, congruent and incongruent AV speech perception, according to either prior coherent or incoherent AV contexts. Following the coherent context, in line with previous electroencephalographic/magnetoencephalographic studies, visual information in the congruent AV condition was found to modify auditory evoked potentials, with a latency decrease of P2 responses compared to the auditory condition. Importantly, both P2 amplitude and latency in the congruent AV condition increased from the coherent to the incoherent context. Although potential contamination by visual responses from the visual cortex cannot be discarded, our results might provide a possible neurophysiological correlate of early binding/unbinding process applied on AV interactions.

Highlights

  • Speech perception requires adequate hearing and listening skills, but it is well known that visual information from the face and from lip movements may intervene in the speech decoding process

  • Recent electroencephalographic (EEG) and magnetoencephalographic (MEG) studies focused on the influence of the visual input on the auditory event-related potentials (ERPs), notably on auditory N1 and P2 responses considered to be associated with the processing of the physical and featural attributes of the auditory speech stimulus prior to its categorization (Näätänen and Winkler, 1999)

  • In the last 10 years, various studies consistently displayed an amplitude reduction of N1/P2 auditory responses together with a decrease in their onset latency. These studies typically involved consonant–vowel syllables uttered in isolation, with a natural advance of the visual input on the sound. Their results suggest that the visual input modulates and speeds up the neural processing of auditory ERPs as soon as 100 ms after the sound onset and that AV integration partly occurs at an early processing stage in the cortical auditory speech processing hierarchy (Besle et al, 2004; van Wassenhove et al, 2005; Stekelenburg and Vroomen, 2007; Arnal et al, 2009; Pilling, 2009; Vroomen and Stekelenburg, 2010; Baart et al, 2014; Knowland et al, 2014; Treille et al, 2014a,b)

Read more

Summary

Introduction

Speech perception requires adequate hearing and listening skills, but it is well known that visual information from the face and from lip movements may intervene in the speech decoding process. The first classical evidence for audiovisual (AV) integration in speech perception in normal-hearing subjects concerns the role of lip reading during speech comprehension, with a gain in the AV modality in respect to the audio-only modality in adverse listening conditions (e.g., Sumby and Pollack, 1954; Erber, 1971; Benoît et al, 1994; Grant and Seitz, 2000; Bernstein et al, 2004b) Another classical behavioral example for AV integration is provided by the McGurk effect (McGurk and MacDonald, 1976), in which a conflicting visual input modifies the perception of an auditory input (e.g., visual /ga/ added on auditory /ba/ leading to the percept of /da/). The interpretation has generally called upon “predictive mechanisms” (van Wassenhove et al, 2005), according to which the visual input, arriving ahead of sound, would enable to predict part of its content and modulate the auditory ERP in amplitude www.frontiersin.org

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.