Abstract

Several behavioural studies have shown that the interplay between voice and face information in audiovisual speech perception is not universal. Native English speakers (ESs) are influenced by visual mouth movement to a greater degree than native Japanese speakers (JSs) when listening to speech. However, the biological basis of these group differences is unknown. Here, we demonstrate the time-varying processes of group differences in terms of event-related brain potentials (ERP) and eye gaze for audiovisual and audio-only speech perception. On a behavioural level, while congruent mouth movement shortened the ESs’ response time for speech perception, the opposite effect was observed in JSs. Eye-tracking data revealed a gaze bias to the mouth for the ESs but not the JSs, especially before the audio onset. Additionally, the ERP P2 amplitude indicated that ESs processed multisensory speech more efficiently than auditory-only speech; however, the JSs exhibited the opposite pattern. Taken together, the ESs’ early visual attention to the mouth was likely to promote phonetic anticipation, which was not the case for the JSs. These results clearly indicate the impact of language and/or culture on multisensory speech processing, suggesting that linguistic/cultural experiences lead to the development of unique neural systems for audiovisual speech perception.

Highlights

  • In face-to-face speech perception, what we hear is influenced by visual information of articulatory movements

  • We aimed to clarify how Japanese and English speakers differ in processing AV speech as a speech event proceeds

  • We investigated whether visual speech facilitates the auditory speech processing of Japanese perceivers as reported in European language perceivers19–22

Read more

Summary

Introduction

In face-to-face speech perception, what we hear is influenced by visual information of articulatory movements. Auditory /ba/ combined with visual mouth movements for /ga/ is often perceived as ‘da’ or ‘tha’ While this audiovisual (AV) integration of speech cues is robust for adult native speakers of English and other European languages, the size of the McGurk effect is known to be much smaller in Japanese perceivers. A difference in the size of the McGurk effect has been observed between children and adults in European language participants with reports indicating that children rely more on auditory information than adults in face-to-face speech perception. We hypothesized that gaze differences would be observed in AV speech perception based on the results of prior studies17,18 By combining these techniques, we aimed to uncover the differences in AV speech perception between Japanese and English speakers

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.