Abstract
Several behavioural studies have shown that the interplay between voice and face information in audiovisual speech perception is not universal. Native English speakers (ESs) are influenced by visual mouth movement to a greater degree than native Japanese speakers (JSs) when listening to speech. However, the biological basis of these group differences is unknown. Here, we demonstrate the time-varying processes of group differences in terms of event-related brain potentials (ERP) and eye gaze for audiovisual and audio-only speech perception. On a behavioural level, while congruent mouth movement shortened the ESs’ response time for speech perception, the opposite effect was observed in JSs. Eye-tracking data revealed a gaze bias to the mouth for the ESs but not the JSs, especially before the audio onset. Additionally, the ERP P2 amplitude indicated that ESs processed multisensory speech more efficiently than auditory-only speech; however, the JSs exhibited the opposite pattern. Taken together, the ESs’ early visual attention to the mouth was likely to promote phonetic anticipation, which was not the case for the JSs. These results clearly indicate the impact of language and/or culture on multisensory speech processing, suggesting that linguistic/cultural experiences lead to the development of unique neural systems for audiovisual speech perception.
Highlights
In face-to-face speech perception, what we hear is influenced by visual information of articulatory movements
We aimed to clarify how Japanese and English speakers differ in processing AV speech as a speech event proceeds
We investigated whether visual speech facilitates the auditory speech processing of Japanese perceivers as reported in European language perceivers19–22
Summary
In face-to-face speech perception, what we hear is influenced by visual information of articulatory movements. Auditory /ba/ combined with visual mouth movements for /ga/ is often perceived as ‘da’ or ‘tha’ While this audiovisual (AV) integration of speech cues is robust for adult native speakers of English and other European languages, the size of the McGurk effect is known to be much smaller in Japanese perceivers. A difference in the size of the McGurk effect has been observed between children and adults in European language participants with reports indicating that children rely more on auditory information than adults in face-to-face speech perception. We hypothesized that gaze differences would be observed in AV speech perception based on the results of prior studies17,18 By combining these techniques, we aimed to uncover the differences in AV speech perception between Japanese and English speakers
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.