Abstract

The evaluation of forecast performance plays a central role both in the interpretation and use of forecast systems and in their development. Different evaluation measures (scores) are available, often quantifying different characteristics of forecast performance. The properties of several proper scores for probabilistic forecast evaluation are contrasted and then used to interpret decadal probability hindcasts of global mean temperature. The Continuous Ranked Probability Score (CRPS), Proper Linear (PL) score, and IJ Good’s logarithmic score (also referred to as Ignorance) are compared; although information from all three may be useful, the logarithmic score has an immediate interpretation and is not insensitive to forecast busts. Neither CRPS nor PL is local; this is shown to produce counter intuitive evaluations by CRPS. Benchmark forecasts from empirical models like Dynamic Climatology place the scores in context. Comparing scores for forecast systems based on physical models (in this case HadCM3, from the CMIP5 decadal archive) against such benchmarks is more informative than internal comparison systems based on similar physical simulation models with each other. It is shown that a forecast system based on HadCM3 out performs Dynamic Climatology in decadal global mean temperature hindcasts; Dynamic Climatology previously outperformed a forecast system based upon HadGEM2 and reasons for these results are suggested. Forecasts of aggregate data (5-year means of global mean temperature) are, of course, narrower than forecasts of annual averages due to the suppression of variance; while the average “distance” between the forecasts and a target may be expected to decrease, little if any discernible improvement in probabilistic skill is achieved.

Highlights

  • Decision making would profit from reliable, high fidelity probability forecasts for climate variables on decadal to centennial timescales

  • Measures of skill play a critical role in the development, deployment and application of probability forecasts

  • The choice of score quite literally determines what can be seen in the forecasts, influencing forecast system design and model development, and decisions on whether or not to purchase forecasts from that forecast system or invest in accordance with the probabilities from a forecast system

Read more

Summary

Introduction

Decision making would profit from reliable, high fidelity probability forecasts for climate variables on decadal to centennial timescales. The intercomparison of simulation models is valuable in many ways; comparison of forecasts from simulation models with empirically-based reference forecasts provides additional information In particular it aids in distinguishing the case when each forecast system does well, and so the best system cannot be identified (i.e. equifinality) from the case in which each forecast system performs very poorly (i.e. equidismality) (Beven 2006; Suckling and Smith 2013). Some climate researchers have required the demonstration of skill against a more prepared reference forecast as a condition for accepting any complicated forecasting scheme as useful (von Storch and Zwiers 1999). Both empirical and simulation models are identified and the primary target, global mean temperature (GMT), is discussed.

Measuring forecast performance
RMSE of the ensemble mean
Naive linear and proper linear scores
Continuous ranked probability
Ignorance
Comparing the behaviour of ignorance and CRPS
Contrasting the skill of decadal forecasts under different scores
Simulation-based hindcasts
The Dynamic Climatology empirical model
Interpreting probabilistic forecast skill scores
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.