Abstract

AbstractCross‐validated principal component regression (PCR) is widely used in day‐to‐day operational forecasting systems for seasonal river runoff volume in western North America. Complexities are increasing in both predictor datasets (including climate‐science products) and in predictive models employed instead of linear regression within the PCR framework (including artificial intelligence), potentially complicating cross‐validation for model evaluation. We explored these issues with 300 modeling experiments on two high‐impact and hydroclimatically diverse basins in the western United States, the Truckee River (Sierra Nevada) and Rio Grande headwaters (southern Rockies), using five different PCR and PCR‐like machine learning models. The results suggest out‐of‐sample error is satisfactorily estimated by applying cross‐validation to only the final, supervised learning, step of PCR/PCR‐like procedures. The outcome facilitates streamlined algorithms and potentially reduced computational times for more complex emerging model architectures and datasets; provides reassurance around a possible inability to perform genuinely complete cross‐validation when predictors include certain complex and externally sourced data sources; and may reflect mitigation of overtraining by geophysical process‐informed model development protocols normally used during feature selection in operational water supply forecast (WSF). The results provide practical guidance helping support the design of next‐generation WSF models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call