The human experience demands seamless attentional switches between sensory modalities. Aging raises questions about how declines in auditory and visual processing affect cross-modal attention switching. This study used a cued cross-modal attention-switching paradigm where visual and auditory stimuli were simultaneously presented on either spatially congruent or incongruent sides. A modality cue indicated the target modality, requiring a spatially left versus right key-press response. EEG recordings were collected during task performance. We investigated whether the mixing costs (decreased performance for repetition trials in a mixed task compared with a single task) and switch costs (decreased performance for a switch of target modality compared with a repetition) in cross-modal attention-switching paradigms would exhibit similarities in terms of behavioral performance and the ERP components to those observed in the traditional unimodal attention-switching paradigms. Specifically, we focused on the ERP components: cue-locked P3 (mixing/switch-related increased positivity), target-locked P3 (mixing/switch-related decreased positivity), and target-locked lateralized readiness potential (mixing/switch-related longer latency). In addition, we assessed how aging impacts cross-modal attention-switching performance. Results revealed that older adults exhibited more pronounced mixing and switch costs than younger adults, especially when visual and auditory stimuli were presented on incongruent sides. ERP findings showed increased cue-locked P3 amplitude, prolonged cue-locked P3 latency, decreased target-locked P3 amplitude, prolonged target-locked P3 latency in association with switch costs, and prolonged onset latency of the target-locked lateralized readiness potential in association with the mixing costs. Age-related effects were significant only for cue-locked P3 amplitude, cue-locked P3 latency (switch-related), and target-locked P3 latency (switch-related). These findings suggest that the larger mixing costs and switch costs in older adults were because of the inefficient use of modality cues to update a representation of the relevant task sets and required more processing time for evaluating and categorizing the target.