Abstract Forecasters routinely calibrate their confidence in model forecasts. Ensembles inherently estimate forecast confidence but are often underdispersive, and ensemble spread does not strongly correlate with ensemble-mean error. The misalignment between ensemble spread and skill motivates new methods for “forecasting forecast skill” so that forecasters can better utilize ensemble guidance. We have trained logistic regression and random forest models to predict the skill of composite reflectivity forecasts from the NSSL Warn-on-Forecast System (WoFS), a 3-km ensemble that generates rapidly updating forecast guidance for 0–6-h lead times. The forecast skill predictions are valid at 1-, 2-, or 3-h lead times within localized regions determined by the observed storm locations at analysis time. We use WoFS analysis and forecast output and NSSL Multi-Radar/Multi-Sensor composite reflectivity for 106 cases from the 2017 to 2021 NOAA Hazardous Weather Testbed Spring Forecasting Experiments. We frame the prediction task as a multiclassification problem, where the forecast skill labels are determined by averaging the extended fraction skill scores (eFSSs) for several reflectivity thresholds and verification neighborhoods and then converting to one of three classes based on where the average eFSS ranks within the entire dataset: POOR (bottom 20%), FAIR (middle 60%), or GOOD (top 20%). Initial machine learning (ML) models are trained on 323 predictors; reducing to 10 or 15 predictors in the final models only modestly reduces skill. The final models substantially outperform carefully developed persistence- and spread-based models and are reasonably explainable. The results suggest that ML can be a valuable tool for guiding user confidence in convection-allowing (and larger-scale) ensemble forecasts. Significance Statement Some numerical weather prediction (NWP) forecasts are more likely to verify than others. Forecasters often recognize situations where NWP output should be trusted more or less than usual, but objective methods for “forecasting forecast skill” are notably lacking for thunderstorm-scale models. Better estimates of forecast skill can benefit society through more accurate forecasts of high-impact weather. Machine learning (ML) provides a powerful framework for relating forecast skill to the characteristics of model forecasts and available observations over many previous cases. ML models can leverage these relationships to predict forecast skill for new cases in real time. We demonstrate the effectiveness of this approach to forecasting forecast skill using a cutting-edge thunderstorm prediction system and logistic regression and random forest models. Based on this success, we recommend the adoption of similar ML-based methods for other prediction models.
Read full abstract