Abstract An examination of the benefits of ensemble forecast calibration was performed for three variables: 500-hPa geopotential height (Z500), 850-hPa temperature (T850), and 2-m temperature (T2M). A large reforecast dataset was used for the calibration. Two calibration methods were examined: a correction for a gross bias in the forecast and an analog method that implicitly adjusted for bias, spread, and applied a downscaling where appropriate. The characteristics of probabilistic forecasts from the raw ensemble were also considered. Forecasts were evaluated using rank histograms and the continuous ranked probability skill score. T2M rank histograms showed a high population of extreme ranks at all leads, and a correction for model bias alleviated this only slightly. The extreme ranks of Z500 rank histograms were slightly underpopulated at short leads, though slightly overpopulated at longer leads. T850 had characteristics in between those of T2M and Z500. Accordingly, Z500 was the most skillful variable without calibration and the variable least improved by calibration, and the bias correction achieved most of the improvement in skill. For T850, there was a more substantial additional increase in skill relative to the bias correction when the analog technique was applied. For T2M forecasts, probabilistic forecasts from the raw ensemble were the least skillful, the application of a bias correction substantially increased the skill, and the application of the analog technique produced the largest further increase in skill relative to the bias correction. Hence, reforecast datasets may be particularly helpful in the improvement of probabilistic forecasts of the variables that are most directly relevant to many forecast users (i.e., the sensible surface-weather variables).
Read full abstract