Causal inference of asynchronous audiovisual speech

John F Magnotti,Michael S Beauchamp,Wei Ji Ma

doi:10.3389/fpsyg.2013.00798

John F Magnotti, Michael S Beauchamp + Show 1 more

Open Access

https://doi.org/10.3389/fpsyg.2013.00798

Copy DOI

Journal: Frontiers in Psychology	Publication Date: Jan 1, 2013
Citations: 79	License type: cc-by

Affiliation: Baylor College of Medicine

Abstract

During speech perception, humans integrate auditory information from the voice with visual information from the face. This multisensory integration increases perceptual precision, but only if the two cues come from the same talker; this requirement has been largely ignored by current models of speech perception. We describe a generative model of multisensory speech perception that includes this critical step of determining the likelihood that the voice and face information have a common cause. A key feature of the model is that it is based on a principled analysis of how an observer should solve this causal inference problem using the asynchrony between two cues and the reliability of the cues. This allows the model to make predictions about the behavior of subjects performing a synchrony judgment task, predictive power that does not exist in other approaches, such as post-hoc fitting of Gaussian curves to behavioral data. We tested the model predictions against the performance of 37 subjects performing a synchrony judgment task viewing audiovisual speech under a variety of manipulations, including varying asynchronies, intelligibility, and visual cue reliability. The causal inference model outperformed the Gaussian model across two experiments, providing a better fit to the behavioral data with fewer parameters. Because the causal inference model is derived from a principled understanding of the task, model parameters are directly interpretable in terms of stimulus and subject properties.

Highlights

When an observer hears a voice and sees mouth movements, there are two potential causal structures (Figure 1A)
BEHAVIORAL RESULTS FROM EXPERIMENT 1 The causal inference of multisensory speech (CIMS) model makes trial-to-trial behavioral predictions about synchrony perception using a limited number of parameters that capture physical properties of speech, the sensory noise of the subject, and the subject’s prior assumptions about the causal structure of the stimuli in the experiment
Audiovisual spatial localization likely occurs in the parietal lobe (Zatorre et al, 2002) while multisensory speech perception is thought to occur in the superior temporal sulcus (Beauchamp et al, 2004)

Summary

Introduction

When an observer hears a voice and sees mouth movements, there are two potential causal structures (Figure 1A). In the second causal structure, the events have two different causes (C = 2): one talker produces the auditory voice and a different talker produces the seen mouth movements. A critical step in audiovisual integration during speech perception is estimating the likelihood that the speech arises from a single talker. This process, known as causal inference (Kording et al, 2007; Schutz and Kubovy, 2009; Shams and Beierholm, 2010; Buehner, 2012), has provided an excellent tool for understanding the behavioral properties of tasks requiring spatial localization of simple auditory beeps and visual flashes (Kording et al, 2007; Sato et al, 2007). We set out to determine whether the causal inference model could explain the behavior of humans perceiving multisensory speech

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Causal inference of asynchronous audiovisual speech

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Psychology

Lead the way for us

Similar Papers

Multisensory Integration Develops Prior to Crossmodal Recalibration.
Sophie Rohlf ... Patrick Bruns
Current Biology | VOL. 30
Sophie Rohlf, et. al.Sophie Rohlf ... Patrick Bruns
19 Mar 2020
Current Biology | VOL. 30

Perception of incongruent audiovisual English consonants.
Kaylah Lalonde ... Lynne A Werner
PloS one | VOL. 14
Kaylah Lalonde, et. al.Kaylah Lalonde ... Lynne A Werner
21 Mar 2019
PloS one | VOL. 14

Causal Models and Probability
G L Wilber
Social Forces | VOL. 46
G L WilberG L Wilber
01 Sep 1967
Social Forces | VOL. 46

Examining the McGurk illusion using high-field 7 Tesla functional MRI
Gregor R Szycik ... Claus Tempelmann
Frontiers in Human Neuroscience | VOL. 6
Gregor R Szycik, et. al.Gregor R Szycik ... Claus Tempelmann
01 Jan 2012
Frontiers in Human Neuroscience | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Causal inference of asynchronous audiovisual speech

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Psychology