Data-driven and knowledge-guided denoising diffusion model for flood forecasting

Pingping Shao,Jun Feng,Jiamin Lu,Pengcheng Zhang,Chenxin Zou

doi:10.1016/j.eswa.2023.122908

Abstract

Data-driven models have been successfully applied in hydrological fields such as flood forecasting. However, limitations to the solutions to scientific problems still exist in this field: data collection is time-consuming and expensive, the quality of the collected data cannot be ensured, and noise or outliers may exist in the dataset, resulting in incorrect results. Moreover, data-driven models are trained only from available datasets and do not involve scientific principles or laws during the model-training process. This may lead to the prediction of specific scientific problems that do not conform to physical laws. Therefore, we propose a data-driven and knowledge-guided denoising diffusion (DK-Diffusion) model. First, for the data preprocessing stage, a coupled heterogeneous mapping tensor decomposition complementary algorithm is proposed that integrates the spatial information of a watershed, optimizes the initialization conditions of the model, reduces the potential correlation loss of data caused by tensor decomposition, and better optimizes the initial conditions of the model. We introduced an attention mechanism into the denoising diffusion probabilistic model (DDPM) to better capture medium and long-term correlations during flood processes. Most importantly, under the guidance of flood physics theory, we designed the loss function of the proposed model to ensure that the output prediction results were more consistent with the laws of flood physics. This is an innovative improvement with greater practical engineering value because it optimizes the boundary conditions of the model, giving it better generalization ability and reducing its dependence on data. Through comparative experiments on datasets from the Qijiang and Tunxi basins in China, compared with the popular flood forecasting model AGCLSTM, the root mean square error (RMSE) was reduced by 20.3–27.7%, and the mean absolute percentage error (MAPE) was reduced by 4.2–4.3%. Compared with the conditional score-based diffusion models for probabilistic time series imputation (CSDI), the average RMSE and mean sum of continuous ranked probability score CRPSsum were reduced by 6.3–10.6% and 6.1–6.2%, respectively.

Full Text