Crossmodal associations modulate multisensory spatial integration

Jonathan Tong,Brigitte Röder,Lux Li,Patrick Bruns

doi:10.3758/s13414-020-02083-2

Jonathan Tong, Brigitte Röder + Show 2 more

Open Access

https://doi.org/10.3758/s13414-020-02083-2

Copy DOI

Abstract

According to the Bayesian framework of multisensory integration, audiovisual stimuli associated with a stronger prior belief that they share a common cause (i.e., causal prior) are predicted to result in a greater degree of perceptual binding and therefore greater audiovisual integration. In the present psychophysical study, we systematically manipulated the causal prior while keeping sensory evidence constant. We paired auditory and visual stimuli during an association phase to be spatiotemporally either congruent or incongruent, with the goal of driving the causal prior in opposite directions for different audiovisual pairs. Following this association phase, every pairwise combination of the auditory and visual stimuli was tested in a typical ventriloquism-effect (VE) paradigm. The size of the VE (i.e., the shift of auditory localization towards the spatially discrepant visual stimulus) indicated the degree of multisensory integration. Results showed that exposure to an audiovisual pairing as spatiotemporally congruent compared to incongruent resulted in a larger subsequent VE (Experiment 1). This effect was further confirmed in a second VE paradigm, where the congruent and the incongruent visual stimuli flanked the auditory stimulus, and a VE in the direction of the congruent visual stimulus was shown (Experiment 2). Since the unisensory reliabilities for the auditory or visual components did not change after the association phase, the observed effects are likely due to changes in multisensory binding by association learning. As suggested by Bayesian theories of multisensory processing, our findings support the existence of crossmodal causal priors that are flexibly shaped by experience in a changing world.

Highlights

When talking with multiple speakers in a noisy environment, we combine the sound of a voice and the sight of moving lips to identify who is speaking, which typically increases the intelligibility of his or her words (Grant & Seitz, 2001; Schwartz, Berthommier, & Savariaux, 2004)
The present study investigated how prior experience with auditory and visual stimuli as either co-occurring or never co-occurring in space and time mediates the degree of audiovisual integration
According to the Bayesian causal inference model (Körding et al, 2007), changes in either the unimodal reliabilities or the crossmodal causal prior could alter the degree of multisensory integration

Summary

Introduction

When talking with multiple speakers in a noisy environment, we combine the sound of a voice and the sight of moving lips to identify who is speaking, which typically increases the intelligibility of his or her words (Grant & Seitz, 2001; Schwartz, Berthommier, & Savariaux, 2004). A statistical pattern that is informative about the latent causal structure is the spatial and temporal relationships of the cues (Ernst, 2007; Parise, 2016; Spence, 2011): cues from a common origin tend to coincide or be proximate in space and time, whereas cues from different origins tend to be spatially separate and temporally uncorrelated. It is a reasonable strategy for observers to rely on spatiotemporal patterns to determine when and to what extent different cues should be integrated. When different cues are presented close to each other in space and time, observers are more likely to ascribe a common underlying cause and integrate the cues, while perceptual integration breaks down when the spatial or temporal discrepancies between cues exceed a certain degree (Ernst & Bülthof, 2004; Parise et al, 2012; Slutsky & Recanzone, 2001)

Objectives

Methods

Results

Discussion

Conclusion