To effectively process the most relevant information, the brain anticipates the optimal timing for allocating attentional resources. Behavior can be optimized by automatically aligning attention with external rhythmic structures, whether visual or auditory. Although the auditory modality is known for its efficacy in representing temporal information, the current body of research has not conclusively determined whether visual or auditory rhythmic presentations have a definitive advantage in entraining temporal attention. The present study directly examined the effects of auditory and visual rhythmic cues on the discrimination of visual targets in Experiment 1 and on auditory targets in Experiment 2. Additionally, the role of endogenous spatial attention was also considered. When and where the target was the most likely to occur were cued by unimodal (visual or auditory) and bimodal (audiovisual) signals. A sequence of salient events was employed to elicit rhythm-based temporal expectations and a symbolic predictive cue served to orient spatial attention. The results suggest a superiority of auditory over visual rhythms, irrespective of spatial attention, whether the spatial cue and rhythm converge or not (unimodal or bimodal), and regardless of the target modality (visual or auditory). These findings are discussed in terms of a modality-specific rhythmic orienting, while considering a single, supramodal system operating in a top-down manner for endogenous spatial attention.