With the development of society and the accelerated industrialization, the problem of water pollution has become increasingly prominent. In order to stop the gathering and diffusion of harmful substances in water bodies, leading to further deterioration of water quality and more serious environmental problems, environmental management departments have developed a series of pollutant discharge standards to prevent water pollution in real time. Common testing methods are the colorimetric method and TDS (total dissolved solids) value testing method, which are mostly through water bodies that contain acid, alkali, salt, and other indicators of the concentration test, to produce an assessment of water quality. However, the traditional methods of water quality testing, whether in the measurement time or in the accuracy of the test, are certain defects. In order to be able to quickly detect the concentration of water quality indicators in water bodies, timely response and treatment of highly polluted water bodies are urgently needed. In this paper, we propose a water quality detection classification model based on multimodal machine learning algorithm. Firstly, we preprocessed and analyzed the collected water quality dataset and determined the reasonable and perfect water quality classification influencing factors. Then, we successively built 15 kinds of classification models based on machine learning algorithms for water quality detection. At the same time, we evaluated the performance of each model. From the four evaluation indexes of precision, recall rate, F1 value, and accuracy, respectively, the real value is compared with the predicted value of each model. The experimental results show that sulfate, pH, solids, and hardness are the important influencing factors to perform water quality testing. And the three models XGBoost (Extreme Gradient Boosting), CatBoost (Categorical Boosting), and LGBM (Light Gradient Boosting Machine) have better performances in conducting water quality testing. Finally, we further optimized the classification models based on XGBoost, CatBoost, and LGBM by using two major tools: cross-validation and hyperparameter tuning.
Read full abstract