Differences in potential and actual skill in a decadal prediction experiment

G J Boer,V V Kharin,W J Merryfield

doi:10.1007/s00382-018-4533-4

Abstract

Decadal prediction results are analyzed for the predictability and skill of annual mean temperature. Forecast skill is assessed in terms of correlation, mean square error (MSE) and mean square skill score. The predictability of the forecast system is assessed by calculating the corresponding “potential” skill measures based on the behaviour of the forecast ensemble. The expectation is that potential skill, where the model predicts its own evolution, will be greater than the actual skill, where the model predicts the evolution of the real system, and that the difference is an indication of the potential for forecast improvement. This will depend, however, on the agreement of the second order climate statistics of the forecasts with those of the climate system. In this study the forecast variance differs from the variance of the verifying observations over non-trivial parts of the globe. Observation-based values of variance from different sources also differ non-trivially. This is an area of difficulty independent of the forecasting system and also affects the comparison of actual and potential mean square error. It is possible to scale the forecast variance estimate to match that of the verifying data so as to avoid this consequence but a variance mismatch, whatever its source, remains a difficulty when considering forecast system improvements. Maps of actual and potential correlation indicate that over most of the globe potential correlation is greater than actual correlation, as expected, with the difference suggesting, but not demonstrating, that it might be possible to improve skill. There are exceptions, mainly over some land areas in the Northern Hemisphere and at later forecast ranges, where actual correlation can exceed potential correlation, and this behaviour is ascribed to excessive noise variance in the forecasts, at least as compared to the verifying data. Sampling error can also play a role, but significance testing suggests it is not sufficient to explain the results. Similar results are obtained for MSE but only after scaling the forecasts to match the variance of the verifying observations. It is immediately clear that the forecast system is deficient, independent of other considerations, if the actual correlation is greater than the potential correlation and/or the actual MSE is less than the potential MSE and this gives some indication of the nature of the deficiency in the forecasts in these regions. The predictable and noise components of an ensemble of forecasts can be estimated but this is not the case for the actual system. The degree to which the difference between actual and potential skill indicates the potential for improvement of the forecasting can only be judged indirectly. At a minimum the variances of the forecasts and of the verifying data should be in reasonable accord. If the potential skill is greater than the actual skill for a forecasting system based on a well behaved model it suggests, as a working hypothesis, that forecast skill can be improved so as to more closely approach potential skill.

Full Text