The aim of this study was (1) to provide behavioral evidence for multimodal feature integration in an object recognition task in humans and (2) to characterize the processing stages and the neural structures where multisensory interactions take place. Event-related potentials (ERPs) were recorded from 30 scalp electrodes while subjects performed a forced-choice reaction-time categorization task: At each trial, the subjects had to indicate which of two objects was presented by pressing one of two keys. The two objects were defined by auditory features alone, visual features alone, or the combination of auditory and visual features. Subjects were more accurate and rapid at identifying multimodal than unimodal objects. Spatiotemporal analysis of ERPs and scalp current densities revealed several auditory-visual interaction components temporally, spatially, and functionally distinct before 200 msec poststimulus. The effects observed were (1) in visual areas, new neural activities (as early as 40 msec poststimulus) and modulation (amplitude decrease) of the N185 wave to unimodal visual stimulus, (2) in the auditory cortex, modulation (amplitude increase) of subcomponents of the unimodal auditory N1 wave around 90 to 110 msec, and (3) new neural activity over the right fronto-temporal area (140 to 165 msec). Furthermore, when the subjects were separated into two groups according to their dominant modality to perform the task in unimodal conditions (shortest reaction time criteria), the integration effects were found to be similar for the two groups over the nonspecific fronto-temporal areas, but they clearly differed in the sensory-specific cortices, affecting predominantly the sensory areas of the nondominant modality. Taken together, the results indicate that multisensory integration is mediated by flexible, highly adaptive physiological processes that can take place very early in the sensory processing chain and operate in both sensory-specific and nonspecific cortical structures in different ways.