Cross-modal prediction in speech perception.

Carolina Sánchez-García,Salvador Soto-Faraco,James T Enns,Agnès Alsius

doi:10.1371/journal.pone.0025198

Carolina Sánchez-García, Salvador Soto-Faraco + Show 2 more

Open Access

PDF Available

https://doi.org/10.1371/journal.pone.0025198

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Speech perception often benefits from vision of the speaker's lip movements when they are available. One potential mechanism underlying this reported gain in perception arising from audio-visual integration is on-line prediction. In this study we address whether the preceding speech context in a single modality can improve audiovisual processing and whether this improvement is based on on-line information-transfer across sensory modalities. In the experiments presented here, during each trial, a speech fragment (context) presented in a single sensory modality (voice or lips) was immediately continued by an audiovisual target fragment. Participants made speeded judgments about whether voice and lips were in agreement in the target fragment. The leading single sensory context and the subsequent audiovisual target fragment could be continuous in either one modality only, both (context in one modality continues into both modalities in the target fragment) or neither modalities (i.e., discontinuous). The results showed quicker audiovisual matching responses when context was continuous with the target within either the visual or auditory channel (Experiment 1). Critically, prior visual context also provided an advantage when it was cross-modally continuous (with the auditory channel in the target), but auditory to visual cross-modal continuity resulted in no advantage (Experiment 2). This suggests that visual speech information can provide an on-line benefit for processing the upcoming auditory input through the use of predictive mechanisms. We hypothesize that this benefit is expressed at an early level of speech analysis.

Highlights

Perceptual information from different sensory systems is often combined to achieve a robust representation of events in the external world [1]
In both the visual and the auditory versions, participants detected audiovisual mismatch in the target more rapidly following a leading informative context than no context. This supports the hypothesis that on-line speech perception benefits from advance information in both the visual and auditory modality
This study offers behavioral evidence that listeners can use speech information on-line to constrain the interpretation of the subsequent signal within and across sensory modalities, thereby benefiting performance in an audiovisual speech matching task

Summary

Introduction

Perceptual information from different sensory systems is often combined to achieve a robust representation of events in the external world [1]. Research during the past two decades has documented numerous instances of multisensory interactions at neuronal and behavioral levels (see [2]). These interactions are demonstrated, for example, in the McGurk effect, such that listening to the spoken syllable /ba/ while simultaneously watching the lip movements corresponding to the syllable /ga/ often results in the illusory perception of /da/ [3]. When visual and acoustic speech signals are correlated, the benefits of multisensory integration in speech perception are well documented (e.g., [4], [5]). The mechanisms that enable this cross-modal benefit are still not well understood

Methods

Results

Conclusion