When multiple visual stimuli are presented simultaneously in the receptive field, the neural response is suppressed compared to presenting the same stimuli sequentially. The prevailing hypothesis suggests that this suppression is due to competition among multiple stimuli for limited resources within receptive fields, governed by task demands. However, it is unknown how stimulus-driven computations may give rise to simultaneous suppression. Using fMRI, we find simultaneous suppression in single voxels, which varies with both stimulus size and timing, and progressively increases up the visual hierarchy. Using population receptive field (pRF) models, we find that compressive spatiotemporal summation rather than compressive spatial summation predicts simultaneous suppression, and that increased simultaneous suppression is linked to larger pRF sizes and stronger compressive nonlinearities. These results necessitate a rethinking of simultaneous suppression as the outcome of stimulus-driven compressive spatiotemporal computations within pRFs, and open new opportunities to study visual processing capacity across space and time.