Abstract

Statistical learning methods offer a promising approach for low flow regionalization. We examine seven statistical learning models (lasso, linear and non-linear model based boosting, sparse partial least squares, principal component regression, random forest, and support vector machine regression) for the prediction of winter and summer low flow based on a hydrological diverse dataset of 260 catchments in Austria. In order to produce sparse models we adapt the recursive feature elimination for variable preselection and propose to use three different variable ranking methods (conditional forest, lasso and linear model based boosting) for each of the prediction models. Results are evaluated for the low flow characteristic Q95 (Pr(Q>Q95) = 0.95) standardized by catchment area using a repeated nested cross validation scheme. We found a generally high prediction accuracy for winter (R2CV of 0.66 to 0.7) and summer (R2CV of 0.83 to 0.86). The models perform similar or slightly better than a Top-kriging model that constitutes the current benchmark for the study area. The best performing models are support vector machine regression (winter) and non-linear model based boosting (summer), but linear models exhibit similar prediction accuracy. The use of variable preselection can significantly reduce the complexity of all models with only a small loss of performance. The so obtained learning models are more parsimonious, thus easier to interpret and more robust when predicting at ungauged sites. A direct comparison of linear and non-linear models reveals that non-linear relationships can be sufficiently captured by linear learning models, so there is no need to use more complex models or to add non-liner effects. When performing low flow regionalization in a seasonal climate, the temporal stratification into summer and winter low flows was shown to increase the predictive performance of all learning models, offering an alternative to catchment grouping that is recommended otherwise.

Highlights

  • Estimating long-term averages of low flow in ungauged basins is crucial for a wide range of applications, e.g., water resource management and engineering, hydropower planning, or ecological issues (Smakhtin, 2001)

  • The best-performing model for winter low flow is the SVR model with a median RC2V of 0.70 over all 10 CV runs. It is followed by the GLM (0.69) and RF (0.68) and a group of performing models (Lasso, GAM, PCR, and sparse partial least squares (sPLS)), with an RC2V of 0.66

  • Middle-ranged values are best approximated by the Lasso model (RMSErel = 0.24), followed by the RF and the SVR and slightly worse performances by the two boosting models (0.26) and sPLS and PCR (0.27)

Read more

Summary

Introduction

Estimating long-term averages of low flow in ungauged basins is crucial for a wide range of applications, e.g., water resource management and engineering, hydropower planning, or ecological issues (Smakhtin, 2001). Statistical low-flow models can be further subdivided into geostatistical models (e.g., Castiglioni et al, 2009, 2011; Laaha et al, 2014) and regression-based methods (e.g., Laaha and Blöschl, 2006, 2007); an overview is given by Salinas et al (2013). Regression methods cover a wide spectrum of models, and especially in the last decade there was increasing interest in statistical learning models in hydrology (Abrahart et al, 2012; Dawson and Wilby, 2001; Nearing et al, 2021; Solomatine and Ostfeld, 2008), with the terms “statistical learning” and “machine learning” being used synonymously. The implementation of statistical learning methods for predicting low flow is still rare

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call