A typical deficiency of ensemble forecasts is the lack of calibration; e.g., ensemble members are under-dispersed and, consequently, probabilities estimated from them cannot be taken at their face values. Self-calibration is one of the unique theoretic properties of the Bayesian forecasting system, BFS. Its ensemble version, EBFS, inherits this property, provided the ensemble size is large enough. That requirement motivated the version with randomization, EBFSR, presented in part I. Its unique advantage is the operational feasibility of generating a hydrologic ensemble forecast of large size, M (having hundreds or even thousands of members), from a meteorologic input ensemble forecast of small size, MI (having only tens or hundreds of members): MI<M, with M=MIR, where R stands for randomization factor (R>1 and integer). This R-fold enlargement of the ensemble size is achieved through a Monte Carlo generator (sans extra runs of the hydrologic model) and, therefore, requires little extra effort or computing time. But what are the statistical implications?The objective of this part II is to identify experimentally the sampling properties of the EBFSR so that its advantage can be properly harnessed in operational forecasting. The core matter to understand and to quantify is the tradeoff between the computing efficiency and the statistical efficiency of the EBFSR. This is so because the user must specify (i) the largest acceptable sampling error [measured in terms of the average expected maximum absolute difference, E¯(MAD), between the true but unknown predictive distribution function of the forecasted variate and the empirical distribution function estimated from a sample of size m, a fraction of the ensemble size M], and (ii) the tradeoff between MI and R. Critical to deciding this tradeoff are two relationships: E¯MAD=ξ(m) and ME=τ(MP,R), where MP is the size of the input sub-ensemble with positive precipitation amounts, and ME is the effective sample size of the output sub-ensemble of size MPR. In terms of output/input ensemble sizes, the computing efficiency is M/MI=R>1, and the statistical efficiency is ME/MPR<1 because MP<ME<MPR.Based on results of large numerical experiments, general parametric forms of ξ and τ are identified, and the dependence of the parameter values is established on factors such as the forecast type (probabilistic river stage forecast, probabilistic stage transition forecast, probabilistic flood forecast), the forecast lead time, the magnitude of the precipitation input uncertainty, and the informativeness of the deterministic hydrologic model employed by the EBFSR. Finally, two algorithms are formulated to aid the user in specifying the values of M and R, such that E¯(MAD) does not exceed the largest acceptable value and that operational constraints are satisfied (e.g., constraints regarding the input ensemble size MI or the feasible number of runs of the hydrologic model).
Read full abstract