Recent functional MRI (fMRI) studies have highlighted differences in responses to natural sounds along the rostral-caudal axis of the human superior temporal gyrus. However, due to the indirect nature of the fMRI signal, it has been challenging to relate these fMRI observations to actual neuronal response properties. To bridge this gap, we present a forward model of the fMRI responses to natural sounds combining a neuronal model of the auditory cortex with physiological modeling of the hemodynamic BOLD response. Neuronal responses are modeled with a dynamic recurrent firing rate model, reflecting the tonotopic, hierarchical processing in the auditory cortex along with the spectro-temporal tradeoff in the rostral-caudal axis of its belt areas. To link modeled neuronal response properties with human fMRI data in the auditory belt regions, we generated a space of neuronal models, which differed parametrically in spectral and temporal specificity of neuronal responses. Then, we obtained predictions of fMRI responses through a biophysical model of the hemodynamic BOLD response (P-DCM). Using Bayesian model comparison, our results showed that the hemodynamic BOLD responses of the caudal belt regions in the human auditory cortex were best explained by modeling faster temporal dynamics and broader spectral tuning of neuronal populations, while rostral belt regions were best explained through fine spectral tuning combined with slower temporal dynamics. These results support the hypotheses of complementary neural information processing along the rostral-caudal axis of the human superior temporal gyrus.