Abstract

In audiovisual speech perception, visual information from a talker's face during mouth articulation is available before the onset of the corresponding audio speech, and thereby allows the perceiver to use visual information to predict the upcoming audio. This prediction from phonetically congruent visual information modulates audiovisual speech perception and leads to a decrease in N1 and P2 amplitudes and latencies compared to the perception of audio speech alone. Whether audiovisual experience, such as with musical training, influences this prediction is unclear, but if so, may explain some of the variations observed in previous research. The current study addresses whether audiovisual speech perception is affected by musical training, first assessing N1 and P2 event-related potentials (ERPs) and in addition, inter-trial phase coherence (ITPC). Musicians and non-musicians are presented the syllable, /ba/ in audio only (AO), video only (VO), and audiovisual (AV) conditions. With the predictory effect of mouth movement isolated from the AV speech (AV−VO), results showed that, compared to audio speech, both groups have a lower N1 latency and P2 amplitude and latency. Moreover, they also showed lower ITPCs in the delta, theta, and beta bands in audiovisual speech perception. However, musicians showed significant suppression of N1 amplitude and desynchronization in the alpha band in audiovisual speech, not present for non-musicians. Collectively, the current findings indicate that early sensory processing can be modified by musical experience, which in turn can explain some of the variations in previous AV speech perception research.

Highlights

  • Perception is shaped by information coming to multiple sensory systems, such as information from hearing speech and seeing a talker’s face coming through the auditory and visual pathways

  • First, to replicate previous investigations with auditory speech, musicians and non-musicians were compared based on their N1 and P2 amplitudes and latencies evoked by the auditory syllable /ba/

  • The current study contributes to previous findings on multimodal perception by investigating whether the AV modulation of speech is modified by previous AV experience, such as musical training, and whether the musical background of the participants can explain some variation across previous studies (Baart, 2016)

Read more

Summary

Introduction

Perception is shaped by information coming to multiple sensory systems, such as information from hearing speech and seeing a talker’s face coming through the auditory and visual pathways. Further research added that the visual information from facial articulations, which begins before the sound onset, can work as a visual cue that leads the perceiver to form some prediction about the upcoming speech sound. This prediction by phonetically congruent visual information can modulate early processing of the audio signal. Visual speech congruent with the auditory signal can speed up and decrease the later component, P2 (Van Wassenhove et al, 2005), which is fronto-central distributed and evoked around 200 ms after the audio onset (Pratt, 2014)

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.