Modeling the Development of Audiovisual Cue Integration in Speech Perception.

Laura Getz,Elke Nordeen,Sarah Vrabic,Joseph Toscano

doi:10.3390/brainsci7030032

Laura Getz, Elke Nordeen + Show 2 more

Open Access

https://doi.org/10.3390/brainsci7030032

Copy DOI

Journal: Brain sciences	Publication Date: Mar 21, 2017
Citations: 5	License type: CC BY 4.0

Affiliation: Villanova University

Abstract

Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.

Highlights

There is no question that speech perception is a multimodal process.In face-to-face conversations, the listener receives both visual information from the speaker’s face and acoustic signals from the speaker’s voice
Brain Sci. 2017, 7, 32 categories via acoustic cues used for acquisition of categories based on visual cues? In this paper, we aim to address these questions by presenting a model of phonetic category acquisition that is trained on data derived from phonetic analyses of visual and auditory speech cues for stop consonants
The main goal of the present study was to address two limitations of previous models: (a) previous audiovisual integration models have not sought to describe the developmental mechanisms that give rise to the changes in cue-weighting observed between children and adults; and (b) previous cue integration models that do describe development (e.g., weighted Gaussian mixture model (WGMM)) [59] have focused only on acoustic cues; they have not demonstrated that unsupervised statistical learning is sufficient to acquire these types of audiovisual representations

Summary

Introduction

There is no question that speech perception is a multimodal process (see [1,2] for reviews).In face-to-face conversations, the listener receives both visual information from the speaker’s face (e.g., their lips, teeth, tongue, and non-mouth facial features) and acoustic signals from the speaker’s voice. In order to use these two sources of information, listeners must combine auditory and visual cues into an integrated percept during spoken language comprehension. A number of studies show that the reliable co-occurrence of synchronous and highly redundant visual and auditory cues supports this ability, leading to accurate speech comprehension by adults [3,4], especially in cases where the auditory signal is degraded due to background noise [5,6,7,8,9]. Mismatching auditory and visual information influences speech perception, as shown in the McGurk effect [10,11]: listening to the spoken syllable. The McGurk effect provides clear evidence that visual information is involved in speech perception even when the auditory signal is perfectly intelligible [12]

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Modeling the Development of Audiovisual Cue Integration in Speech Perception.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Brain sciences

Lead the way for us

Similar Papers

Audiovisual Speech Integration Does Not Rely on the Motor System: Evidence from Articulatory Suppression, the McGurk Effect, and fMRI
William Matchin ... Kier Groulx
Journal of Cognitive Neuroscience | VOL. 26
William Matchin, et. al.William Matchin ... Kier Groulx
01 Mar 2014
Journal of Cognitive Neuroscience | VOL. 26

Temporal synchrony and audiovisual integration of speech and object stimuli in autism
Elizabeth Smith ... Loisa Bennetto
Research in Autism Spectrum Disorders | VOL. 39
Elizabeth Smith, et. al.Elizabeth Smith ... Loisa Bennetto
15 Apr 2017
Research in Autism Spectrum Disorders | VOL. 39

Vision of tongue movements bias auditory speech perception
Alessandro D’Ausilio ... Luciano Fadiga
Neuropsychologia | VOL. 63
Alessandro D’Ausilio, et. al.Alessandro D’Ausilio ... Luciano Fadiga
27 Aug 2014
Neuropsychologia | VOL. 63

What is the McGurk effect?
Kaisa Tiippana
Frontiers in Psychology | VOL. 5
Kaisa TiippanaKaisa Tiippana
10 Jul 2014
Frontiers in Psychology | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Modeling the Development of Audiovisual Cue Integration in Speech Perception.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Brain sciences