Cross-modal matching of audio-visual German and French fluent speech in infancy.

Claudia Kubicek,Anne Hillairet De Boisferon,Eve Dupierrix,Gudrun Schwarzer,Judit Gervain,Olivier Pascalis,Hélène Lœvenbruck,Andrew Bremner

doi:10.1371/journal.pone.0089275

Abstract

The present study examined when and how the ability to cross-modally match audio-visual fluent speech develops in 4.5-, 6- and 12-month-old German-learning infants. In Experiment 1, 4.5- and 6-month-old infants’ audio-visual matching ability of native (German) and non-native (French) fluent speech was assessed by presenting auditory and visual speech information sequentially, that is, in the absence of temporal synchrony cues. The results showed that 4.5-month-old infants were capable of matching native as well as non-native audio and visual speech stimuli, whereas 6-month-olds perceived the audio-visual correspondence of native language stimuli only. This suggests that intersensory matching narrows for fluent speech between 4.5 and 6 months of age. In Experiment 2, auditory and visual speech information was presented simultaneously, therefore, providing temporal synchrony cues. Here, 6-month-olds were found to match native as well as non-native speech indicating facilitation of temporal synchrony cues on the intersensory perception of non-native fluent speech. Intriguingly, despite the fact that audio and visual stimuli cohered temporally, 12-month-olds matched the non-native language only. Results were discussed with regard to multisensory perceptual narrowing during the first year of life.

Highlights

From birth on, infants experience a multisensory world where they are required to process information presented in more than one sensory modality, for example, the auditory and visual speech information emanating from the face of a speaker
The multimodality of speech is typically evidenced by the McGurk effect in which conflicting auditory and visual speech information of syllables lead to illusory percepts in adults and children indicating audio-visual speech integration [1]
The inconsistent findings between Experiment 1a and Lewkowicz and Pons’ study [12] could likely be caused by the use of different familiarization times. It seems as if 6-month-old infants need a sufficient amount of time to encode the auditory language input in order to become able to match it to the visual speech information

Summary

Introduction

Infants experience a multisensory world where they are required to process information presented in more than one sensory modality, for example, the auditory and visual speech information emanating from the face of a speaker. McGurk-type effects have even been found in 4.5-month-old infants [2,3,4] It is still not fully understood when and how infants master the task of matching speech information from different modalities. When visual and auditory speech information is presented simultaneously in an intermodal modal matching task, it has been observed that from 2 months of age infants audiovisually match vowels [5,6,7,8,9,10]. When visual and auditory stimuli are presented sequentially, that is, across a temporal delay, 6-month-olds were shown to match isolated auditory and visual attributes of syllables indicating that temporal synchrony is not essential for matching audio and visual speech information in infants at that age [11]. One of the few studies addressing this issue suggests that the intersensory response to audio-visual fluent speech emerges late in infancy restricted to native language input [12]

Methods

Results

Discussion

Conclusion