Summary In this study, different versions of autoregressive error models are evaluated as post-processors for probabilistic streamflow forecasts. The post-processors account for hydrologic uncertainties that are introduced by the precipitation–runoff model. The post-processors are evaluated with the discrete ranked probability score (DRPS), and a non-parametric bootstrap is applied to investigate the significance of differences in model performance. The results show that differences in performance between most model versions are significant. For these cases it is found that (1) error models with state dependent parameters perform better than those with constant parameters, (2) error models with an empirical distribution for the description of the standardized residuals perform better than those with a normal distribution, and (3) procedures that use a logarithmic transformation of the original streamflow values perform better than those that use a square root transformation.