Abstract. Dynamical (i.e., model-based) methods are widely used by forecasting centers to generate seasonal streamflow forecasts, building upon process-based hydrological models that require parameter specification (i.e., calibration). Here, we investigate the extent to which the choice of calibration objective function affects the quality of seasonal (spring–summer) streamflow hindcasts produced with the traditional ensemble streamflow prediction (ESP) method and explore connections between hindcast skill and hydrological consistency – measured in terms of biases in hydrological signatures – obtained from the model parameter sets. To this end, we calibrate three popular conceptual rainfall-runoff models (GR4J, TUW, and Sacramento) using 12 different objective functions, including seasonal metrics that emphasize errors during the snowmelt period, and produce hindcasts for five initialization times over a 33-year period (April 1987–March 2020) in 22 mountain catchments that span diverse hydroclimatic conditions along the semiarid Andes Cordillera (28–37∘ S). The results show that the choice of calibration metric becomes relevant as the winter (snow accumulation) season begins (i.e., 1 July), enhancing inter-basin differences in hindcast skill as initializations approach the beginning of the snowmelt season (i.e., 1 September). The comparison of seasonal hindcasts shows that the hydrological consistency – quantified here through biases in streamflow signatures – obtained with some calibration metrics (e.g., Split KGE (Kling–Gupta efficiency), which gives equal weight to each water year in the calibration time series) does not ensure satisfactory seasonal ESP forecasts and that the metrics that provide skillful ESP forecasts (e.g., VE-Sep, which quantifies seasonal volume errors) do not necessarily yield hydrologically consistent model simulations. Among the options explored here, an objective function that combines the Kling–Gupta efficiency (KGE) and the Nash–Sutcliffe efficiency (NSE) with flows in log space provides the best compromise between hydrologically consistent simulations and hindcast performance. Finally, the choice of calibration metric generally affects the magnitude, rather than the sign, of correlations between hindcast quality attributes and catchment descriptors, the baseflow index and interannual runoff variability being the best predictors of forecast skill. Overall, this study highlights the need for careful parameter estimation strategies in the forecasting production chain to generate skillful forecasts from hydrologically consistent simulations and draw robust conclusions on streamflow predictability.