Enhancing spatial streamflow prediction through machine learning algorithms and advanced strategies

Sedigheh Darabi Cheghabaleki,Seyed Ehsan Fatemi,Maryam Hafezparast Mavadat

doi:10.1007/s13201-024-02154-x

Abstract

Forecasting and extending streamflow is a critical aspect of hydrology, especially where the time series are locally unavailable for a variety of reasons. The necessity of preprocessing, model fine-tuning, feature selection, or sampling to enhance prediction outcomes for streamflow forecasting using ML techniques is evaluated in this study. In this regard, the monthly streamflow at Pol-Chehr station is analyzed using various monthly rainfall and streamflow time series data from different stations. The results of streamflow prediction in the k-folds cross-validator approach are generally better than those of the time series approach, except when raw data with no preprocessing or feature selection is used. Applying the simple SVR model to raw data leads to the weakest result, but using the GA-SVR model on raw data significantly increases the Nash coefficient by about 215% and 72%, decreases the NRMSE by about 48% and 36% in the k-fold and time series approaches, even with no feature selection. On the other hand, standardization produces highly accurate model predictions in both the k-fold and time series approaches, with a minimum Nash coefficient of 0.83 and 0.73 during the test period in the simple SVR model, respectively. Finally, using optimization algorithms like GA to fine-tune ML models and feature selection does not always yield improved prediction accuracy, but it depends on whether raw or preprocessed data is chosen. In conclusion, combining k-fold cross-validator and preprocessing typically yields highly accurate predictive results, with an R value exceeding 93.7% (Nash = 0.83, SI = 0.55, NRMSE = 0.09), without requiring any additional fine-tuning or optimization. Using feature selection is only significant when utilizing the TS approach as well.

Full Text