Audiovisual perception is still an intriguing phenomenon, especially when we think about the physical and neuronal differences underlying the perception of sound and light. Physically, there is a delay of ∼3 ms/m between the emission of a sound and its arrival to the observer but, on the other hand, we know that acoustic transduction is a very fast process (∼1 ms). Conversely, light speed makes negligible the physical delay while phototransduction is quite slow (∼50 ms). Audio and visual stimuli that are temporally mismatched can be perceived as a coherent audiovisual stimulus, but a sound delay is often required to achieve a better synchrony perception. In this study, we analyze the Point of Subjective Synchrony (PSS) as a function of stimulus distance to understand if individuals take into account sound velocity or if they compensate for differences in transduction time when judging synchrony. Using an audiovisual virtual-reality environment (CAVE-Like) with Point Light Walkers (PLW) as visual stimulus and sound of steps as audio stimulus, audiovisual sequences were presented from −285 to +300 ms of audio asynchrony, at different distances from the observer (10, 15, 20, 25, 30 and 35 m), and in three different conditions which differ only in the number of visual and auditory depth cues. The results show a relation between PSS and stimulus distance congruent with the differences in propagation velocity between sound and light. Depending on the number of depth cues presented, this relation appears to be increasingly closer to a model based on compensation for these physical differences.