Abstract

Recently, there has been an increased emphasis on employing data-driven models to forecast streamflow. However, in these data-driven models used for forecasting monthly streamflow, the performances of filter-based feature selection (FFS) methods have not been studied in detail. In this study, we investigated the effectiveness of eight common FFS methods, namely, linear Pearson correlation, partial linear Pearson correlation (PCI), mutual information (MI), conditional MI, partial MI, maximal relevance minimal redundancy Pearson correlation, maximal relevance minimal redundancy MI and gamma test methods, on three regression models, namely multiple linear regression (MLR), ensemble extreme learning machine (enELM) and k-nearest neighbor (KNN) regression, for real-world one-month-ahead streamflow forecasting. The study was conducted on three cases from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) data sets. Furthermore, two termination criterion (TC) methods, the Hampel test and resampling, were comparatively analyzed. The results of this study highlight three important findings. First, there was no dominant FFS method that coupled with enELM or KNN. Second, when resampling was applied to select a final model in the candidate combinations of the eight FFS methods and three regression models, PCI was the most favorable FFS method for the final model. Finally, the Hampel test TC was superior to the resampling TC in terms of stability and anti-overfitting. These findings have significant practical reference value for real-world monthly streamflow forecasting.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call