Abstract

Hydroelectric power generation, water supplies for municipal, agricultural, manufacturing, and service industry uses including technology-sector requirements, dam safety, flood control, recreational uses, and ecological and legal constraints, all place simultaneous, competing demands on the heavily stressed water management infrastructure of the mostly arid American West. Optimally managing these resources depends on predicting water availability. We built a probabilistic nonlinear regression water supply forecast (WSF) technique for the US Department of Agriculture, which runs the largest stand-alone WSF system in the US West. Design criteria included improved accuracy over the existing system; uncertainty estimates that seamlessly handle complex (heteroscedastic, non-Gaussian) prediction errors; integration of physical hydrometeorological process knowledge and domain-specific expert experience; ability to accommodate nonlinearity, model selection uncertainty and equifinality, and predictor multicollinearity and high dimensionality; and relatively easy, low-cost implementation. Some methods satisfied some of these requirements but none met all, leading us to develop a novel, interdisciplinary, and pragmatic prediction metasystem through a carefully considered synthesis of well-established, off-the-shelf components and approaches, spanning supervised and unsupervised machine learning, nonparametric statistical modeling, ensemble learning, and evolutionary optimization, focusing on maintaining but radically updating the principal components regression framework widely used for WSF. Testing this integrated multi-method prediction engine demonstrated its value for river forecasting; USDA adoption is a landmark for transitioning machine learning from research into practice in this field. Its ability to handle all the foregoing design criteria and requirements, which are not unique to WSF, suggests potential for extension to complex probabilistic prediction problems in other fields.

Highlights

  • President Teddy Roosevelt’s 1901 description of the American West, ‘‘Whoever controls the stream practically controls the land,’’ remains true today

  • A data-driven water supply forecast (WSF) system requires methods for addressing predictor multicollinearity, identifying multiple input signals with potential WSF predictive value, an objective means for identifying the most promising predictor variables from a pool of broadly reasonable candidates, and relating these to forthcoming water supply availability using a regression-like model. These tasks are performed here using a combination of an unsupervised learning algorithm for feature extraction, an evolutionary algorithm for feature selection, and a suite of regression models embedded within that semi-automated feature generation and selection framework that were chosen for specific characteristics known to be important from WSF experience, such as ability to handle nonlinearity and heteroscedastic or non-normal error distributions, as well as other logistical considerations, such as a proven track record, as described above in the system design criteria (Section I.B)

  • We describe a study in which a number of supervised and unsupervised machine learning, nonparametric statistical, ensemble modeling, and evolutionary optimization methods were integrated into a prediction metasystem and used to radically update and improve an existing principal components regression framework for water supply forecasting in the US West

Read more

Summary

INTRODUCTION

President Teddy Roosevelt’s 1901 description of the American West, ‘‘Whoever controls the stream practically controls the land,’’ remains true today. A data-driven WSF system requires methods for addressing predictor multicollinearity, identifying multiple input signals with potential WSF predictive value, an objective means for identifying the most promising predictor variables from a pool of broadly reasonable candidates, and relating these to forthcoming water supply availability using a regression-like model These tasks are performed here using a combination of an unsupervised learning algorithm for feature extraction, an evolutionary algorithm for feature selection, and a suite of regression models embedded within that semi-automated feature generation and selection framework that were chosen for specific characteristics known to be important from WSF experience, such as ability to handle nonlinearity and heteroscedastic or non-normal error distributions, as well as other logistical considerations, such as a proven track record, as described above in the system design criteria (Section I.B). Construction emphasized a modular and flexible framework into which new methods, or probabilistic prediction products from completely different external sources (such as physical process simulation models), can be integrated in the future if desired, leaving as many development and refinement options open as pragmatically possible

FEATURE CREATION BY UNSUPERVISED LEARNING
MULTI-METHOD ENSEMBLE
OPTIMAL FEATURE SELECTION USING EVOLUTIONARY COMPUTING
MODEL OUTPUT AGGREGATION AND PRACTICAL QUALITY CONTROL
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call