A combined water quality pollution prediction model based on the Spark big data platform

Zhihui Sun,Yiqing Fan

doi:10.2166/aqua.2022.036

Abstract

Abstract Water quality prediction is the basic work of water resource management and pollution control, and it is crucial to accurately predict the trend of pollutant concentration in water bodies over time. Water quality data prediction has an important significance, as it provides data support for the effective estimation of water quality, and is also an indirect way to protect water resources and the environment. At present there are a variety of water quality prediction methods, but these methods still have some shortcomings. In this paper, the main water quality pollution indicators such as the dissolved oxygen (DO), ammonia nitrogen (NH3-N) and total phosphorus (P) data were the object of study to build a water quality prediction model. The water quality prediction index contains numerous nonlinear correlation characteristics that results in low training efficiency on a large-scale data. Therefore, a combined water quality prediction model based on integrated ensemble empirical mode decomposition (EEMD) and cascade support vector machine (Cascade SVM) is proposed. First, the EEMD method is used to highlight the real characteristics of the original water quality data series. Then, the parallel training and prediction process are realized by the Spark, a distributed computing engine, to parallelize the traditional Cascade SVM. The experimental results show that the proposed combined model shows a strong superiority in many aspects of performance such as training efficiency and prediction accuracy.

Full Text