Streamflow time series data typically exhibit nonlinear and nonstationary characteristics that complicate precise estimation. Recently, multifactorial machine learning (ML) models have been developed to enhance the performance of streamflow predictions. However, the lack of interpretability within these ML models raises concerns about their inner workings and reliability. This paper introduces an innovative hybrid architecture, the TCN-LSTM-Multihead-Attention model, which combines two layers of temporal convolutional networks (TCN) followed by one layer of long short-term memory (LSTM) units, integrated with a Multihead-Attention mechanism for predicting streamflow with streamflow causation-driven prediction samples (RCDP), employing local and global interpretability studies through Shapley values and partial dependency analysis. The find_peaks method was used to identify peak flow events in the test dataset, validating the model's generality and uncovering the physical causative patterns of streamflow. The results show that (1) compared to the LSTM model with the same hyperparameter settings, the proposed TCN-LSTM-Multihead-Attention hybrid model increased the R2 by 52.9%, 2.5%, 43.1%, and 10.7% respectively at four stations in the test set predictions using RCDP samples. Moreover, comparing the prediction results of the hybrid model under different samples in Hengshan station, the R2 for RCDP increased by 5.06% and 1.22% compared to streamflow autoregressive prediction samples (RAP) and meteorological-soil volumetric water content coupled autoregressive prediction samples (MCSAP) respectively. (2) Historical streamflow data from the preceding 3days predominantly influences predictions due to strong autocorrelation, with flow quantity (Q) typically emerging as the most significant feature alongside precipitation (P), surface soil moisture (SSM), and adjacent station flow data. (3) During periods of low and normal flow, historical data remains the most crucial factor; however, during flood periods, the roles of upstream inflow and precipitation become significantly more pronounced. This model facilitates the identification and quantification of various hydrodynamic impacts on flow predictions, including upstream flood propagation, precipitation, and soil moisture conditions. It also elucidates the model's nonlinear relationships and threshold responses, thereby enhancing the interpretability and reliability of streamflow predictions.
Read full abstract