A Machine Learning Metasystem for Robust Probabilistic Nonlinear Regression-Based Forecasting of Seasonal Water Availability in the US West

Sean W Fleming,Angus G Goodbody

doi:10.1109/access.2019.2936989

Abstract

Hydroelectric power generation, water supplies for municipal, agricultural, manufacturing, and service industry uses including technology-sector requirements, dam safety, flood control, recreational uses, and ecological and legal constraints, all place simultaneous, competing demands on the heavily stressed water management infrastructure of the mostly arid American West. Optimally managing these resources depends on predicting water availability. We built a probabilistic nonlinear regression water supply forecast (WSF) technique for the US Department of Agriculture, which runs the largest stand-alone WSF system in the US West. Design criteria included improved accuracy over the existing system; uncertainty estimates that seamlessly handle complex (heteroscedastic, non-Gaussian) prediction errors; integration of physical hydrometeorological process knowledge and domain-specific expert experience; ability to accommodate nonlinearity, model selection uncertainty and equifinality, and predictor multicollinearity and high dimensionality; and relatively easy, low-cost implementation. Some methods satisfied some of these requirements but none met all, leading us to develop a novel, interdisciplinary, and pragmatic prediction metasystem through a carefully considered synthesis of well-established, off-the-shelf components and approaches, spanning supervised and unsupervised machine learning, nonparametric statistical modeling, ensemble learning, and evolutionary optimization, focusing on maintaining but radically updating the principal components regression framework widely used for WSF. Testing this integrated multi-method prediction engine demonstrated its value for river forecasting; USDA adoption is a landmark for transitioning machine learning from research into practice in this field. Its ability to handle all the foregoing design criteria and requirements, which are not unique to WSF, suggests potential for extension to complex probabilistic prediction problems in other fields.

Highlights

President Teddy Roosevelt’s 1901 description of the American West, ‘‘Whoever controls the stream practically controls the land,’’ remains true today
A data-driven water supply forecast (WSF) system requires methods for addressing predictor multicollinearity, identifying multiple input signals with potential WSF predictive value, an objective means for identifying the most promising predictor variables from a pool of broadly reasonable candidates, and relating these to forthcoming water supply availability using a regression-like model. These tasks are performed here using a combination of an unsupervised learning algorithm for feature extraction, an evolutionary algorithm for feature selection, and a suite of regression models embedded within that semi-automated feature generation and selection framework that were chosen for specific characteristics known to be important from WSF experience, such as ability to handle nonlinearity and heteroscedastic or non-normal error distributions, as well as other logistical considerations, such as a proven track record, as described above in the system design criteria (Section I.B)
We describe a study in which a number of supervised and unsupervised machine learning, nonparametric statistical, ensemble modeling, and evolutionary optimization methods were integrated into a prediction metasystem and used to radically update and improve an existing principal components regression framework for water supply forecasting in the US West

Summary

INTRODUCTION

President Teddy Roosevelt’s 1901 description of the American West, ‘‘Whoever controls the stream practically controls the land,’’ remains true today. A data-driven WSF system requires methods for addressing predictor multicollinearity, identifying multiple input signals with potential WSF predictive value, an objective means for identifying the most promising predictor variables from a pool of broadly reasonable candidates, and relating these to forthcoming water supply availability using a regression-like model These tasks are performed here using a combination of an unsupervised learning algorithm for feature extraction, an evolutionary algorithm for feature selection, and a suite of regression models embedded within that semi-automated feature generation and selection framework that were chosen for specific characteristics known to be important from WSF experience, such as ability to handle nonlinearity and heteroscedastic or non-normal error distributions, as well as other logistical considerations, such as a proven track record, as described above in the system design criteria (Section I.B). Construction emphasized a modular and flexible framework into which new methods, or probabilistic prediction products from completely different external sources (such as physical process simulation models), can be integrated in the future if desired, leaving as many development and refinement options open as pragmatically possible

FEATURE CREATION BY UNSUPERVISED LEARNING

MULTI-METHOD ENSEMBLE

OPTIMAL FEATURE SELECTION USING EVOLUTIONARY COMPUTING

MODEL OUTPUT AGGREGATION AND PRACTICAL QUALITY CONTROL

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 37	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Machine Learning Metasystem for Robust Probabilistic Nonlinear Regression-Based Forecasting of Seasonal Water Availability in the US West

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Linkage mapping of the porcine chromogranin B (CHGB) gene to chromosome 171
J G Kim ... G A Rohrer
Animal Genetics | VOL. 36
J G Kim, et. al.J G Kim ... G A Rohrer
31 Mar 2005
Animal Genetics | VOL. 36

Ensuring Equitable Access to School Meals
Sheila Fleischhacker ... Elizabeth Campbell
Journal of the Academy of Nutrition and Dietetics | VOL. 120
Sheila Fleischhacker, et. al.Sheila Fleischhacker ... Elizabeth Campbell
23 Apr 2020
Journal of the Academy of Nutrition and Dietetics | VOL. 120

SNOTEL, the Soil Climate Analysis Network, and water supply forecasting at the Natural Resources Conservation Service: Past, present, and future
Sean W Fleming ... Lucas Zukiewicz
JAWRA Journal of the American Water Resources Association | VOL. 59
Sean W Fleming, et. al.Sean W Fleming ... Lucas Zukiewicz
22 Feb 2023
JAWRA Journal of the American Water Resources Association | VOL. 59

The New USDA: Cultivating Change
David A Taylor
Environmental Health Perspectives | VOL. 117
David A TaylorDavid A Taylor
01 Sep 2009
Environmental Health Perspectives | VOL. 117

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Machine Learning Metasystem for Robust Probabilistic Nonlinear Regression-Based Forecasting of Seasonal Water Availability in the US West

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access