Model Diagnostics and Forecast Evaluation for Quantiles

Tilmann Gneiting,Sebastian Lerch,Johannes Resin,Melanie Schienle,Veit Hagenmeyer,Johannes Bracher,Kaleb Phipps,Kristof Kraus,Daniel Wolffram,Alexander I Jordan,Timo Dimitriadis

doi:10.1146/annurev-statistics-032921-020240

Abstract

Model diagnostics and forecast evaluation are closely related tasks, with the former concerning in-sample goodness (or lack) of fit and the latter addressing predictive performance out-of-sample. We review the ubiquitous setting in which forecasts are cast in the form of quantiles or quantile-bounded prediction intervals. We distinguish unconditional calibration, which corresponds to classical coverage criteria, from the stronger notion of conditional calibration, as can be visualized in quantile reliability diagrams. Consistent scoring functions—including, but not limited to, the widely used asymmetricpiecewise linear score or pinball loss—provide for comparative assessment and ranking, and link to the coefficient of determination and skill scores. We illustrate the use of these tools on Engel's food expenditure data, the Global Energy Forecasting Competition 2014, and the US COVID-19 Forecast Hub.

Full Text