Reward value and selective attention both enhance the representation of sensory stimuli at the earliest stages of processing. It is still debated whether and how reward-driven and attentional mechanisms interact to influence perception. Here we ask whether the interaction between reward value and selective attention depends on the sensory modality through which the reward information is conveyed. Human participants first learned the reward value of uni-modal visual and auditory stimuli during a conditioning phase. Subsequently, they performed a target detection task on bimodal stimuli containing a previously rewarded stimulus in one, both, or neither of the modalities. Additionally, participants were required to focus their attention on one side and only report targets on the attended side. Our results showed a strong modulation of visual and auditory event-related potentials (ERPs) by spatial attention. We found no main effect of reward value but importantly we found an interaction effect as the strength of attentional modulation of the ERPs was significantly affected by the reward value. When reward effects were examined separately with respect to each modality, auditory value-driven modulation of attention was found to dominate the ERP effects whereas visual reward value on its own led to no effect, likely due to its interference with the target processing. These results inspire a two-stage model where first the salience of a high reward stimulus is enhanced on a local priority map specific to each sensory modality, and at a second stage reward value and top-down attentional mechanisms are integrated across sensory modalities to affect perception.