Abstract

River water temperature is essential in regulating many physical and biochemical processes in river systems. Consequently, it is crucial to develop reliable tools for predicting extreme river temperatures at sites with little or no available data. This study aims to compare two machine learning models, random forest (RF) and extreme gradient boosting (XGBoost), with non-parametric multivariate adaptive regression splines (MARS) and semi-parametric generalized additive models (GAMs) for the regional estimation of maximum water temperatures at ungauged locations. Three linear and non-linear approaches are also considered in the homogeneous regions delineation step of regional frequency analysis: canonical correlation analysis (CCA), neural network-based canonical correlation analysis (NLCCA), as well as considering all stations (ALL). The results indicate that GAM and MARS lead to the best performances. The performance of NLCCA+GAM is the best in terms of absolute and relative mean square error, followed by CCA + MARS. A significant improvement in the performance of adopted models is achieved by using neighborhood methods. The two machine learning models are tested using two variable selection methods: Recursive Feature Elimination (RFE) and Least Absolute Shrinkage and Selection Operator (LASSO). The results, however, do not show any significant differences. These results may be indicative of the flexibility and ability of the GAM and MARS approaches to reproduce thermal extremes, especially under real-world conditions when a limited amount of data is available.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call