Abstract

Democratization of sensing devices in industrial systems has made it possible to collect a large amount of data of different types, which has led to the necessity of handling complex analyses for knowledge extraction. The field of water resources is of those areas which has drawn the attention of decision-makers seeking to preserve human health and safety. Recent advances in Artificial Intelligence, particularly in the domain of Machine Learning, have opened the potential to leverage massive data to better address the issue related to the relationship between water quality and human activities. However, high rate of missing data and heterogeneity of the measurements are scientific issues that cannot be solved by standard methods, especially when no prior knowledge on the label of each observation is provided. In this article, Prognostics and Health Management was implemented to detect and diagnose anomalies in water quality datasets, taking into account the uncertainties induced by the above-mentioned issues. Fuzzy c-means was used to identify the different water quality classes, while Random Forest was applied to determine the most influencing parameters, with respect to potential contamination of water resources in the southwest of France. The results suggest that multiple imputation methods can handle the missingness issue, while the use of decision rules based on well-known water quality standards can solve the problem regarding the lack of labelled observations. In addition, two potential sources of contamination (atrazine and nitrate) were identified and then validated by hydrogeology experts, prior to further online deployment of the proposed model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call