An explainable deep learning model based on hydrological principles for flood simulation and forecasting
Abstract. Deep learning (DL) models always perform well in hydrological simulation but lack physical-based principles. To address this gap, we integrate the relatively complex runoff generation and flow routing principals of Xinanjiang (XAJ) model into the architecture of recurrent neural network (RNN) units and establish a physical-based XAJRNN layer. Subsequently, this layer is fused with LSTM layers to construct an explainable deep learning (EDL) model, which underwent testing at the Lushui River and Qingjiang River basins in China. Compared to benchmark models, the proposed EDL model performs very well, the average Nash-Sutcliffe efficiency (NSE) values for these two basins are 0.98 and 0.94, respectively. The flood peak relative errors (PRE) and peak timing difference (ΔT) are close to zero, which demonstrate that the EDL model can accurately simulate flood events. Notably, the EDL model incorporated physical principles not only can improve flow simulation accuracy, but also enhance interpretability, which offer fresh insights for the fusion of DL and hydrological models for flood simulation and forecasting.
- Research Article
52
- 10.1016/s2095-3119(13)60421-9
- Jun 27, 2013
- Journal of Integrative Agriculture
Impacts of Climate Change on Water and Agricultural Production in Ten Large River Basins in China
- Preprint Article
- 10.5194/egusphere-egu25-19190
- Mar 15, 2025
Accurate streamflow estimation is crucial for effective water resource management and flood forecasting. However, physics-based hydrological models fail to respond promptly to rapid hydrological events due to lack efficiency in model calibration and computing time for large-scale catchment , while existing deep learning models tend to neglect the physical processes of runoff transfer, failing to account for the spatial and temporal dependencies inherent in runoff dynamics. In this study, we propose a topological process-based model that integrates Graph Attention Networks (GAT) to capture the spatial topology of runoff transfer and Long Short-Term Memory (LSTM) networks to simulate the temporal transfer between upstream and downstream runoff. The model was applied to the Yangtze River Basin which is the largest river basin in China to predict streamflow at 10 km spatial resolution. Validation results show that our model achieves a median Nash-Sutcliffe Efficiency (NSE) value of 0.783 at secondary outlet stations across the basin and effectively simulates the streamflow peak due to flooding. Additionally, the model is capable of simulating the spatial distribution of daily streamflow for an entire year within 10 seconds, providing a significant computational speedup compared to physical process-based river confluence models. This work represents a step towards more efficient and responsive prediction of extreme hydrological events using deep learning model.
- Research Article
15
- 10.1016/j.jhydrol.2024.131434
- Jun 1, 2024
- Journal of Hydrology
A process-driven deep learning hydrological model for daily rainfall-runoff simulation
- Research Article
89
- 10.1016/j.egyr.2022.07.139
- Aug 3, 2022
- Energy Reports
Multivariate time series prediction by RNN architectures for energy consumption forecasting
- Research Article
59
- 10.3390/w10050642
- May 16, 2018
- Water
The accuracy and sufficiency of precipitation data play a key role in environmental research and hydrological models. They have a significant effect on the simulation results of hydrological models; therefore, reliable hydrological simulation in data-scarce areas is a challenging task. Advanced techniques can be utilized to improve the accuracy of satellite-derived rainfall data, which can be used to overcome the problem of data scarcity. Our study aims to (1) assess the accuracy of different satellite precipitation products such as Tropical Rainfall Measuring Mission (TRMM 3B42 V7), Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN), PERSIANN-Climate Data Record (PERSIANN-CDR), and China Meteorological Assimilation Driving Datasets for the SWAT Model (CMADS) by comparing them with gauged rainfall data; and (2) apply them for runoff simulations for the Han River Basin in South Korea using the SWAT model. Based on the statistical measures, that is, the proportion correct (PC), the probability of detection (POD), the frequency bias index (FBI), the index of agreement (IOA), the root-mean-square-error (RMSE), the mean absolute error (MAE), the coefficient of determination (R2), and the bias, the rainfall data of the TRMM and CMADS show a better accuracy than those of PERSIANN and PERSIANN-CDR when compared to rain gauge measurements. The TRMM and CMADS data capture the spatial rainfall patterns in mountainous areas as well. The streamflow simulated by the SWAT model using ground-based rainfall data agrees well with the observed streamflow with an average Nash-Sutcliffe efficiency (NSE) of 0.68. The four satellite rainfall products were used as inputs in the SWAT model for streamflow simulation and the results were compared. The average R2, NSE, and percent bias (PBIAS) show that hydrological models using TRMM (R2 = 0.54, NSE = 0.49, PBIAS = [−52.70–28.30%]) and CMADS (R2 = 0.44, NSE = 0.42, PBIAS = [−29.30–41.80%]) data perform better than those utilizing PERSIANN (R2 = 0.29, NSE = 0.13, PBIAS = [38.10–83.20%]) and PERSIANN-CDR (R2 = 0.25, NSE = 0.16, PBIAS = [12.70–71.20%]) data. Overall, the results of this study are satisfactory, given that rainfall data obtained from TRMM and CMADS can be used to simulate the streamflow of the Han River Basin with acceptable accuracy. Based on these results, TRMM and CMADS rainfall data play important roles in hydrological simulations and water resource management in the Han River Basin and in other regions with similar climate and topographical characteristics.
- Research Article
48
- 10.3390/rs9111176
- Nov 21, 2017
- Remote Sensing
This study aimed to statistically and hydrologically assess the performance of the four latest and widely used satellite–gauge combined precipitation estimates (SGPEs), namely CRT (CMORPH CRT), BLD (CMORPH BLD), CDR (PERSIANN CDR), 3B42 (TMPA 3B42 version 7) over the upper yellow river basins (UYRB) in china during 2001–2012 time period. The performances of the SGPEs were compared with the Chinese Meteorological Administration (CMA) datasets using the hydrologic model called Variable Infiltration Capacity (VIC) which is known as a land surface hydrologic model. Results indicated that irrespective of the slight underestimation in the western mountains and overestimation in the southeast, the four SGPEs could generally captured the spatial distribution of precipitation well. Although 3B42 exhibited a better performance in capturing the spatial distribution of daily average precipitation, BLD agreed best with CMA in the time series of watershed average precipitation, which resulted in BLD having a comparable performance to the CMA in the long-term hydrological simulations. Moreover, the potential for disastrous heavy rain mainly occurs in southeastern corner of the basin, and CRT and BLD comparisons showed to be closer to the CMA in the distribution of extreme precipitation events while 3B42 and CDR overestimated the extreme precipitation especially over the southeast of UYRB region. Therefore, CRT and BLD were able to match the high peak discharges very well for the wet seasons, while 3B42 and CDR overrated the high peak discharges. In addition, the four SGPEs performed well for the 2005 flood event but exhibited poorly when tested for the 2012 flood event. Results indicate that the application of the four SGPEs should be used with caution in simulating massive flood events over UYRB region.
- Research Article
1
- 10.1007/s00521-024-09824-6
- Jul 2, 2024
- Neural Computing and Applications
Recurrent neural networks (RNN) are ubiquitous computing systems for sequences and multivariate time-series data. While several robust RNN architectures are known, it is unclear how to relate RNN initialization, architecture, and other hyperparameters with accuracy for a given task. In this work, we propose treating RNN as dynamical systems and correlating hyperparameters with accuracy through Lyapunov spectral analysis, a methodology designed explicitly for nonlinear dynamical systems. To address the fact that RNN features go beyond the existing Lyapunov spectral analysis, we propose to infer relevant features from the Lyapunov spectrum with an Autoencoder and an embedding of its Latent representation (AeLLE). Our studies of various RNN architectures show that AeLLE successfully correlates RNN Lyapunov spectrum with accuracy. Furthermore, the Latent representation learned by AeLLE is generalizable to novel inputs from the same task and is formed early in the process of RNN training. The latter property allows for predicting the accuracy to which RNN would converge when training is complete. We conclude that the representation of RNN through the Lyapunov spectrum, along with AeLLE, provides a novel method for the organization and interpretation of variants of RNN architectures.
- Research Article
13
- 10.13031/2013.25247
- Jan 1, 2008
- Transactions of the ASABE
Tillage and pesticide management are important factors controlling pesticide losses from agricultural watersheds. In this research, tillage activities were mapped from Landsat TM and MODIS data and were used in Soil and Water Assessment Tool (SWAT) model to simulate atrazine concentrations in the St. Joseph River in northeastern Indiana. The calibrated and validated model proved to be crucial in making early warning predictions and decisions on atrazine pollution. Average Nash-Sutcliffe efficiency (NSE) values of 0.56 and 0.70 were obtained for daily and monthly stream flow calibration, respectively, while those for validation were 0.55 and 0.79, respectively. The best NSE values ranged from 0.06 to 0.42 for daily atrazine calibrations at four locations within the watershed and from 0.01 to 0.29 during validation. Daily and monthly R2 values at the St. Joseph watershed outlet during the atrazine validation were 0.35 and 0.63, respectively. Although NSE values for some water quality stations were poor, predicted atrazine concentrations compared reasonably well to measured trends at the watershed outlet. Pollution peaks in simulated atrazine concentrations were also within days of measured atrazine concentrations. The research showed that the temporal and spatial trend of tillage activities, which influences the timing and location of atrazine applications, together with application amounts have significant impact on critical areas and concentration levels of atrazine pollution. Uncertainties in observed data could also affect the model outcome. The results showed the potential application in early warning prediction of atrazine pollution and can be used to make appropriate management decisions to mitigate this problem.
- Research Article
5
- 10.3390/su142416701
- Dec 13, 2022
- Sustainability
Accurately predicting network-level traffic conditions has been identified as a critical need for smart and advanced transportation services. In recent decades, machine learning and artificial intelligence have been widely applied for traffic state, including traffic volume prediction. This paper proposes a novel deep learning model, Graph Convolutional Neural Network with Data-driven Graph Filter (GCNN-DDGF), for network-wide multi-step traffic volume prediction. More specifically, the proposed GCNN-DDGF model can automatically capture hidden spatiotemporal correlations between traffic detectors, and its sequence-to-sequence recurrent neural network architecture is able to further utilize temporal dependency from historical traffic flow data for multi-step prediction. The proposed model was tested in a network-wide hourly traffic volume dataset between 1 January 2018 and 30 June 2019 from 150 sensors in the Los Angeles area. Detailed experimental results illustrate that the proposed model outperforms the other five widely used deep learning and machine learning models in terms of computational efficiency and prediction accuracy. For instance, the GCNN-DDGF model improves MAE, MAPE, and RMSE by 25.33%, 20.45%, and 29.20% compared to the state-of-the-art models, such as Diffusion Convolution Recurrent Neural Network (DCRNN), which is widely accepted as a popular and effective deep learning model.
- Research Article
17
- 10.5194/hess-25-6185-2021
- Dec 6, 2021
- Hydrology and Earth System Sciences
Abstract. Contamination of surface waters with microbiological pollutants is a major concern to public health. Although long-term and high-frequency Escherichia coli (E. coli) monitoring can help prevent diseases from fecal pathogenic microorganisms, such monitoring is time-consuming and expensive. Process-driven models are an alternative means for estimating concentrations of fecal pathogens. However, process-based modeling still has limitations in improving the model accuracy because of the complexity of relationships among hydrological and environmental variables. With the rise of data availability and computation power, the use of data-driven models is increasing. In this study, we simulated fate and transport of E. coli in a 0.6 km2 tropical headwater catchment located in the Lao People's Democratic Republic (Lao PDR) using a deep-learning model and a process-based model. The deep learning model was built using the long short-term memory (LSTM) methodology, whereas the process-based model was constructed using the Hydrological Simulation Program–FORTRAN (HSPF). First, we calibrated both models for surface as well as for subsurface flow. Then, we simulated the E. coli transport with 6 min time steps with both the HSPF and LSTM models. The LSTM provided accurate results for surface and subsurface flow with 0.51 and 0.64 of the Nash–Sutcliffe efficiency (NSE) values, respectively. In contrast, the NSE values yielded by the HSPF were −0.7 and 0.59 for surface and subsurface flow. The simulated E. coli concentrations from LSTM provided the NSE of 0.35, whereas the HSPF gave an unacceptable performance with an NSE value of −3.01 due to the limitations of HSPF in capturing the dynamics of E. coli with land-use change. The simulated E. coli concentration showed the rise and drop patterns corresponding to annual changes in land use. This study showcases the application of deep-learning-based models as an efficient alternative to process-based models for E. coli fate and transport simulation at the catchment scale.
- Research Article
11
- 10.1016/j.jhydrol.2024.131923
- Sep 2, 2024
- Journal of Hydrology
Improving streamflow forecasting in semi-arid basins by combining data segmentation and attention-based deep learning
- Research Article
1
- 10.2166/wcc.2023.104
- Aug 22, 2023
- Journal of Water and Climate Change
Water quality has become a significant concern in many river basins in China due to both point and non-point source pollution. The SWAT model assessed pollution reduction scenarios and their effects on Donghe River basin water quality in southwest China. The calibrated model evaluated existing point and non-point emissions. Three schemes reduced point sources by 30, 60, and 90% and non-point sources by 25, 50, and 75%, respectively. Simulations analyzed annual and monthly total phosphorus (TP) concentrations under the scenarios. Results showed that the scenarios effectively improved water quality, meeting Class IV TP standards annually. However, TP exceeded standards in dry months (January–April, December) under all scenarios. A certain degree of negative correlation (R = −0.52, P = 0.11) between TP and rainfall suggests rainfall that influences TP. Comprehensive measures are needed to achieve standards year-round. In summary, the study found that reducing emissions improved Donghe water quality overall but more work is required to meet standards during dry periods. Rainfall correlates with and may affect TP. The work emphasizes implementing comprehensive approaches for year-round water quality improvements in the basin.
- Research Article
25
- 10.1016/j.ecolind.2023.110893
- Sep 7, 2023
- Ecological Indicators
Water scarcity risk through trade of the Yellow River Basin in China
- Research Article
18
- 10.1016/j.chemosphere.2023.139537
- Jul 19, 2023
- Chemosphere
Environmental exposure and ecological risk of perfluorinated substances (PFASs) in the Shaying River Basin, China
- Research Article
13
- 10.3390/w10030294
- Mar 9, 2018
- Water
Assessment of the response of streamflow to future climate change in headwater areas is of a particular importance for sustainable water resources management in a large river basin. In this study, we investigated multiscale variation in hydroclimatic variables including streamflow, temperature, precipitation, and evapotranspiration in the Headwater Areas of the Nenjiang River Basin (HANR) in China’s far northeast, which are sensitive to climate change. We analyzed 50-year-long (1961–2010) records of the hydroclimatic variables using the ensemble empirical mode decomposition (EEMD) method to identify their inherent changing patterns and trends at the inter-annual and inter-decadal scales. We found that all these hydroclimatic variables showed a clear nonlinear process. At the inter-annual and inter-decadal scales, streamflow had a similar periodic changing pattern and transition years to that of precipitation; however, within a period, streamflow showed a close association with temperature and evapotranspiration. The findings indicate that the response of streamflow in headwater regions to climate change is a nonlinear dynamic process dictated by precipitation at the decadal scale and modified by temperature and evapotranspiration within a decade.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.