Money Doesn't Grow on Trees, but Forecasts Do: Forecasting Extreme Precipitation with Random Forests

Gregory R Herman,Russ S Schumacher

doi:10.1175/mwr-d-17-0250.1

Gregory R Herman, Russ S Schumacher

Open Access

https://doi.org/10.1175/mwr-d-17-0250.1

Copy DOI

Journal: Monthly Weather Review	Publication Date: May 1, 2018
Citations: 84	License type: implied-oa

Affiliation: Colorado State University

Abstract

Abstract Approximately 11 years of reforecasts from NOAA’s Second-Generation Global Ensemble Forecast System Reforecast (GEFS/R) model are used to train a contiguous United States (CONUS)-wide gridded probabilistic prediction system for locally extreme precipitation. This system is developed primarily using the random forest (RF) algorithm. Locally extreme precipitation is quantified for 24-h precipitation accumulations in the framework of average recurrence intervals (ARIs), with two severity levels: 1- and 10-yr ARI exceedances. Forecasts are made from 0000 UTC forecast initializations for two 1200–1200 UTC periods: days 2 and 3, comprising, respectively, forecast hours 36–60 and 60–84. Separate models are trained for each of eight forecast regions and for each forecast lead time. GEFS/R predictors vary in space and time relative to the forecast point and include not only the quantitative precipitation forecast (QPF) output from the model, but also variables that characterize the meteorological regime, including winds, moisture, and instability. Numerous sensitivity experiments are performed to determine the effects of the inclusion or exclusion of different aspects of forecast information in the model predictors, the choice of statistical algorithm, and the effect of performing dimensionality reduction via principal component analysis as a preprocessing step. Overall, it is found that the machine learning (ML)-based forecasts add significant skill over exceedance forecasts produced from both the raw GEFS/R ensemble QPFs and from the European Centre for Medium-Range Weather Forecasts’ (ECMWF) global ensemble across almost all regions of the CONUS. ML-based forecasts are found to be underconfident, while raw ensemble forecasts are highly overconfident.

Full Text