Abstract

Abstract. This comment focuses on the statistical limitations of a model grading, as applied by D. Waugh and V. Eyring (2008) (WE08). The grade g is calculated for a specific diagnostic, which basically relates the difference of means of model and observational data to the standard deviation in the observational dataset. We performed Monte Carlo simulations, which show that this method has the potential to lead to large 95%-confidence intervals for the grade. Moreover, the difference between two model grades often has to be very large to become statistically significant. Since the confidence intervals were not considered in detail for all diagnostics, the grading in WE08 cannot be interpreted, without further analysis. The results of the statistical tests performed in WE08 agree with our findings. However, most of those tests are based on special cases, which implicitely assume that observations are available without any errors and that the interannual variability of the observational data and the model data are equal. Without these assumptions, the 95%-confidence intervals become even larger. Hence, the case, where we assumed perfect observations (ignored errors), provides a good estimate for an upper boundary of the threshold, below that a grade becomes statistically significant. Examples have shown that the 95%-confidence interval may even span the whole grading interval [0, 1]. Without considering confidence intervals, the grades presented in WE08 do not allow to decide whether a model result significantly deviates from reality. Neither in WE08 nor in our comment it is pointed out, which of the grades presented in WE08 inhibits such kind of significant deviation. However, our analysis of the grading method demonstrates the unacceptably high potential for these grades to be insignificant. This implies that the grades given by WE08 can not be interpreted by the reader. We further show that the inclusion of confidence intervals into the grading approach is necessary, since otherwise even a perfect model may get a low grade.

Highlights

  • Waugh and Eyring (2008) (WE08) applied a set of performance metrics to climate-chemistry models (CCMs) aiming at quantifying their ability to reproduce key processes relevant for stratospheric ozone

  • That is exactly what the first two sentences of the abstract of Waugh and Eyring (WE08 in the following) is about: “A set of performance metrics is applied to stratospheric-resolving chemistry-climate models (CCMs) to quantify their ability to reproduce key processes relevant for stratospheric ozone

  • In the paper “Quantitative performance metrics for stratospheric-resolving chemistry-climate models” by Waugh and Eyring (2008) a method was introduced, which converts the outcome of a diagnostic, i.e. a comparison of climate-chemistry model data and observational data, into a grade

Read more

Summary

Introduction

Waugh and Eyring (2008) (WE08) applied a set of performance metrics to climate-chemistry models (CCMs) aiming at quantifying their ability to reproduce key processes relevant for stratospheric ozone. These performance metrics are used to calculate a quantitative measure of performance, i.e. a grade. 5 the implications for a grading are discussed, when statistical significance levels are included in the grading approach This illustrates the difference between the information on model performances presented in WE08 and the statistically sound information

What is a grading?
When is a model result representing an observation?
When is a model better than another?
Terms and definitions
A perfect model and perfect observations
A perfect model and imperfect observations
Two identical models
Consequences for the grading
Konf95
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call