Abstract

Data-driven modeling with machine learning (ML) algorithms in hydrologic modeling and forecasting suffers a number of pitfalls and challenges, including variable selection bias, resubstitution validation, inconsistent validation processes for different algorithms and model selection using the test set. They lead to incorrect model development and biased, overly optimistic performance estimates – and thus, unreliable models. This study presents a novel model building and testing workflow that addresses common machine learning (ML) challenges and pitfalls. The presented workflow incorporates optional variable transformation and preprocessing techniques, and applies to various ML model types, variable selection algorithms and resampling techniques. We demonstrate its performance through streamflow forecasting for the Bow River Basin, Alberta, Canada, using four conventional ML algorithms (ANN, SVM, ELM and RBF networks) driven by local hydrometeorological conditions and large-scale climate indices. Using the cross-validation average out-of-sample results and a separate test set, the prediction accuracy estimate bias (the relative difference between the model performance estimated using the validation sets and a separate test set) was empirically estimated to be 5.6%, 4.4%, 2.5% and 3.0% for the Seasonal, May, June and July models, respectively. In addition, the streamflow forecasting models had an average coefficient of determination of 0.85. Preprocessing and dimensionality reduction through principal component analysis (PCA) was detrimental to prediction accuracy. Snow water equivalent from individual snow courses proved the most important predictor for Bow River streamflow, while global climate indices, including the PDO, AMO and PNA, increased the Nash–Sutcliffe efficiency by 6% to 50%. Finally, although forecasting skill decreased with increasing forecast lead time, satisfactory forecasts (NSE>0.5) could be obtained two months in advance of the spring melt, at the end of February. Extensions of this study should address the tendency of different variable selection algorithms to pick irrelevant or redundant predictors after changes in training data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.