Accurate runoff forecasting results can not only provide an important basis for flood control scheduling, but also provide scientific support for water resources optimization, which promotes the maximization of the overall benefits of the basin. To further explore the inherent mechanisms of the atmosphere–ocean-land factors driving runoff changes, this study proposes the factors dimension reduction and interpretation framework based on Pearson, eXtreme Gradient Boosting and SHapley Additive exPlanations (P-XGBoost-SHAP). Base on this, the Gaussian Process Regression (GPR), Long Short-Term Memory neural network (LSTM) and Support Vector Machine (SVM) models are used to construct the atmospheric-ocean-land data-driven runoff prediction model. Meanwhile, for the runoff prediction residuals, this paper proposes an error multi-step correction framework based on ensemble empirical mode decomposition and autoregressive model (EEMD-AR). The case study of Lianghekou hydrological station shows that the factor dimension reduction and interpretation framework can greatly reduce the input dimension of the model, and explain the factors globally and locally by using SHAP Value. Compared with the traditional Random Forest (RF) dimension reduction method, it shows higher prediction accuracy. The Nash-Sutcliffe efficiency coefficient (NSE) can be increased to about 0.93, which is 4.91 % and 1.97 % higher than the series–parallel coupling (AR-Parallel) and empirical mode decomposition-autoregressive (EMD-AR) correction methods, respectively. The accuracy of the runoff forecasting prediction is improved while reducing the input dimension of the model.
Read full abstract