Perceiving acoustic cues that convey music emotion is challenging for cochlear implant (CI) users. Emotional arousal (stimulating/relaxing) can be conveyed by temporal cues such as tempo, while emotional valence (positive/negative) can be conveyed by spectral information salient to pitch and harmony. It is however unclear the extent to which other temporal and spectral features convey emotional arousal and valence in music, respectively. In 23 normal-hearing participants, we varied the quality of temporal and spectral content using vocoders during a music emotion categorization task—musical excerpts conveyed joy (high arousal high valence), fear (high arousal low valence), serenity (low arousal high valence), and sorrow (low arousal low valence). Vocoder carriers (sinewave/noise) primarily modulated temporal information, and filter orders (low/high) primarily modulated spectral information. Improvement of temporal- (using sinewave carriers) and spectral content (using high filter order) both improved categorization. Vocoder results were compared to data from 25 CI users performing the same task with non-vocoded musical excerpts. The CI user data showed a similar pattern of errors as observed for the vocoded conditions in normal-hearing participants, suggesting that increasing the quality of temporal information, and not only spectral details, could prove beneficial for CI users’ music emotion perception.