The application of functional magnetic resonance imaging (fMRI) to the human spinal cord is still a relatively small field of research and faces many challenges. Here we aimed to probe the limitations of task-based spinal fMRI at 3T by investigating the reliability of spinal cord blood oxygen level dependent (BOLD) responses to repeated nociceptive stimulation across two consecutive days in 40 healthy volunteers. We assessed the test-retest reliability of subjective ratings, autonomic responses, and spinal cord BOLD responses to short heat pain stimuli (1s duration) using the intraclass correlation coefficient (ICC). At the group level, we observed robust autonomic responses as well as spatially specific spinal cord BOLD responses at the expected location, but no spatial overlap in BOLD response patterns across days. While autonomic indicators of pain processing showed good-to-excellent reliability, both β-estimates and z-scores of task-related BOLD responses showed poor reliability across days in the target region (gray matter of the ipsilateral dorsal horn). When taking into account the sensitivity of gradient-echo echo planar imaging (GE-EPI) to draining vein signals by including the venous plexus in the analysis, we observed BOLD responses with fair reliability across days. Taken together, these results demonstrate that heat pain stimuli as short as one second are able to evoke a robust and spatially specific BOLD response, which is however strongly variable within participants across time, resulting in low reliability in the dorsal horn gray matter. Further improvements in data acquisition and analysis techniques are thus necessary before event-related spinal cord fMRI as used here can be reliably employed in longitudinal designs or clinical settings.