Motion is represented by low-level signals, such as size-expansion in vision or loudness changes in the auditory modality. The visual and auditory signals from the same object or event may be integrated and facilitate detection. We explored behavioural and electrophysiological correlates of congruent and incongruent audio–visual depth motion in conditions where auditory level changes, visual expansion, and visual disparity cues were manipulated. In Experiment 1 participants discriminated auditory motion direction whilst viewing looming or receding, 2D or 3D, visual stimuli. Responses were faster and more accurate for congruent than for incongruent audio–visual cues, and the congruency effect (i.e., difference between incongruent and congruent conditions) was larger for visual 3D cues compared to 2D cues. In Experiment 2, event-related potentials (ERPs) were collected during presentation of the 2D and 3D, looming and receding, audio–visual stimuli, while participants detected an infrequent deviant sound. Our main finding was that audio–visual congruity was affected by retinal disparity at an early processing stage (135–160ms) over occipito-parietal scalp. Topographic analyses suggested that similar brain networks were activated for the 2D and 3D congruity effects, but that cortical responses were stronger in the 3D condition. Differences between congruent and incongruent conditions were observed between 140–200ms, 220–280ms, and 350–500ms after stimulus onset.