Accurate forecasting of water quality variables in river systems is crucial for relevant administrators to identify potential water quality degradation issues and take countermeasures promptly. However, pure data-driven forecasting models are often insufficient to deal with the highly varying periodicity of water quality in today's more complex environment. This study presents a new holistic framework for time-series forecasting of water quality parameters by combining advanced deep learning algorithms (i.e., Long Short-Term Memory (LSTM) and Informer) with causal inference, time-frequency analysis, and uncertainty quantification. The framework was demonstrated for total nitrogen (TN) forecasting in the largest artificial lakes in Asia (i.e., the Danjiangkou Reservoir, China) with six-year monitoring data from January 2017 to June 2022. The results showed that the pre-processing techniques based on causal inference and wavelet decomposition can significantly improve the performance of deep learning algorithms. Compared to the individual LSTM and Informer models, wavelet-coupled approaches diminished well the apparent forecasting errors of TN concentrations, with 24.39%, 32.68%, and 41.26% reduction at most in the average, standard deviation, and maximum values of the errors, respectively. In addition, a post-processing algorithm based on the Copula function and Bayesian theory was designed to quantify the uncertainty of predictions. With the help of this algorithm, each deterministic prediction of our model can correspond to a range of possible outputs. The 95% forecast confidence interval covered almost all the observations, which proves a measure of the reliability and robustness of the predictions. This study provides rich scientific references for applying advanced data-driven methods in time-series forecasting tasks and a practical methodological framework for water resources management and similar projects.
Read full abstract