Abstract

Events encoded in separate sensory modalities, such as audition and vision, can seem to be synchronous across a relatively broad range of physical timing differences. This may suggest that the precision of audio-visual timing judgments is inherently poor. Here we show that this is not necessarily true. We contrast timing sensitivity for isolated streams of audio and visual speech, and for streams of audio and visual speech accompanied by additional, temporally offset, visual speech streams. We find that the precision with which synchronous streams of audio and visual speech are identified is enhanced by the presence of additional streams of asynchronous visual speech. Our data suggest that timing perception is shaped by selective grouping processes, which can result in enhanced precision in temporally cluttered environments. The imprecision suggested by previous studies might therefore be a consequence of examining isolated pairs of audio and visual events. We argue that when an isolated pair of cross-modal events is presented, they tend to group perceptually and to seem synchronous as a consequence. We have revealed greater precision by providing multiple visual signals, possibly allowing a single auditory speech stream to group selectively with the most synchronous visual candidate. The grouping processes we have identified might be important in daily life, such as when we attempt to follow a conversation in a crowded room.

Highlights

  • Determining the simultaneity of events occurring in multiple sensory modalities is a conceptually challenging task

  • One way to deal with such variability when judging simultaneity would be to adopt a broad criterion, with auditory and visual signals judged as synchronous across an extended range of physical timing differences

  • The range of physical timing differences across which audio and visual events can seem synchronous is shaped by content, with audiovisual (AV) speech likely to be judged as synchronous across a broader range of timing differences than more basic stimuli, such as light flashes and beeps [2,4,13]

Read more

Summary

Introduction

Determining the simultaneity of events occurring in multiple sensory modalities is a conceptually challenging task. The environment in which humans exist is cluttered, with many events occurring in close spatial and temporal proximity This situation is exacerbated by differences in transmission times for auditory and visual information that originate from a single physical event. The range of physical timing differences across which audio and visual events can seem synchronous is shaped by content, with audiovisual (AV) speech likely to be judged as synchronous across a broader range of timing differences than more basic stimuli, such as light flashes and beeps [2,4,13]. Changes in this range can occur due to learning and previous experience [17,18,19,20,21,22]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.