There has been no consistent conclusion concerning whether auditory spatial attention or visual spatial attention could be modulated by auditory aversive cues. In three experiments, we used direct aversive auditory stimuli (white noise) as cues and explored which subcomponents of attentional bias contribute to auditory and cross-modal spatial attention in unselected samples. In Experiment 1, in a dot-probe paradigm, we adopted auditory stimuli (aversive or neutral) as cues and a tick sound as target, and we set two stimulus onset asynchrony (SOA) conditions: 150 ms and 500 ms. The results of experiment 1 showed that participants performed faster on congruent trials than on incongruent trials and participants exhibited an auditory emotional attentional bias to aversive auditory stimuli in 150 ms SOA condition. Subsequently, in experiment 2 and experiment 3, we employed an auditory emotional spatial cueing task using neutral and negative auditory stimuli as cues. Targets were auditory stimuli (Experiment 2) or visual stimuli (Experiment 3). The results of experiment 2 showed that participants performed faster to targets primed by negative cues than to those primed by neutral cues in valid condition; and performed slower to targets primed by negative cues than to those primed by neutral cues in invalid condition in 150 ms SOA condition. Experiment 2 revealed that speeded engagement with and delayed disengagement from aversive auditory stimuli were both present at a 150 ms SOA condition; at 500 ms SOA condition, only the former effect was present, and auditory inhibition of return was also observed. Experiment 3 produced similar results cross-modally, but cross-modal inhibition of return was not observed. In all experiments, we conclude that emotional attention can operate within the auditory modality and across sensory modalities, and that both engagement and disengagement bias contribute to auditory and cross-modal emotional attention.