Probabilistic urban water demand forecasting using wavelet-based machine learning models

Mostafa Rezaali,John Quilty,Abdolreza Karimi

doi:10.1016/j.jhydrol.2021.126358

Abstract

A recent nonlinear and multiscale framework, the Wavelet Data-Driven Forecasting Framework (WDDFF), was proposed for water resources forecasting. The main objective of this study is to explore the WDDFF for short-term urban water demand (UWD) forecasting over multiple lead times (1, 2, 3, 6, 12, 18, and 24 h ahead) by focusing on two separate issues that have yet to be considered within the framework: 1) a comparison of artificial neural network (ANN), least squares support vector machines (LSSVM), regularized extreme learning machines (RELM), and random forest (RF) and 2) two dataset partitioning approaches for reducing overfitting in deterministic and probabilistic machine learning (ML) models, a permutation-based approach for the deterministic models and a bootstrap-based approach for the probabilistic models. The secondary objective is to evaluate the usefulness of an input variable selection approach based on RF (RFIVS) for identifying the most important inputs to use in the ML models. The results of a real-world UWD forecasting case study in Qom, Iran demonstrate several noteworthy findings: 1) the probabilistic RF and its 'best' wavelet-based version provided the most accurate and reliable forecasts, with average test set Nash Sutcliffe Efficiency Index (NASH) coefficients (i.e., across all lead times) of ∼ 0.80 and 0.81, respectively; 2) the permutation- and bootstrap-based dataset partitioning approaches demonstrated potential to reduce overfitting; 3) wavelet decomposition improved probabilistic and deterministic ML model performance, increasing test set NASH coefficients by 1–7% on average (across all lead times); 4) wavelet-based models provided approximately the same level of reliability as the non-wavelet-based models but the best performing wavelet-based models improved forecast sharpness by an average of 14–24% (across all lead times); and 5) RFIVS substantially reduced the number of input variables used in the ML models (e.g., the number of inputs used in the wavelet-based models was often reduced by 50%) while still leading to improved performance over the case where all input variables were used.

Full Text