Abstract

Shams and Kim (S&K) [1], argue that visual perception can be affected by sound and touch, and that such alterations can occur at early stages of processing. This view is by now well-documented, but here I would like to draw some parallels to another domain – the processing of audiovisual speech – that allows one to make interesting generalizations to the auditory domain. It is well-known that the identification of an auditory utterance can be strongly biased by concurrent lipread information and that the lipread information may even change “sensory-specific” auditory cortex [2,3]. Less well-known is that perception of auditory speech is – like visual perception – also highly adaptive even in the mature brain. Recent research has demonstrated that adult listeners readily use lipread information to adjust the phonetic boundary between two speech categories [4]: For example, an auditory ambiguous sound halfway between /b/ and /d/ is perceived as /b/ if the sound was on a previous occasion combined with lipread /b/, but as /d/ if the same ambiguous sound was on the previous occasion combined with lipread /d/. Lipread information thus can serve as a ‘teacher’ for auditory perception. Audiovisual speech also provides a unique test case to examine the problem of causal inference without the need to change potentially important characteristic of the stimulus. As S&K rightly point out, multisensory integration only makes sense if the sensory signals are caused by the same object. This is a non-trivial inference problem for the brain as it typically does not have a clue about the causal structure of an event. S&K adopt a Bayesian approach to examine causal inference and interactions in the audiovisual spatial and numerical domain. We used another procedure to address the same question for speech, namely by creating Sine Wave Speech’ (SWS) [5]. In SWS the spectral richness of speech is reduced to a few sinusoids, and naive listeners typically perceive the sounds as non-speech whistles. Only when informed that the sounds are actually derived from an utterance, listeners perceive the sound as speech and then cannot switch back to non-speech mode again. The interesting part of this is that only listeners in speech mode are affected by lipread information, – both immediate identification and adaptation aftereffects – while listeners in non-speech mode do not show these intersensory integration effects. This finding demonstrates the causal inference problem in the clearest way: Lipread information affects sound identification only if the two information sources are assigned to the same event: namely an articulatory gesture.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.