Abstract
The need for clean water is a fundamental requirement that must be met by humans, as water constitutes 60 to 70% of the total human body weight. Therefore, it is important to be able to determine the quality of the water entering the body, as consuming unsafe water will bring various diseases, such as diarrhea, and in severe cases might lead to death. This study aimed to investigate the factors which determine the potability of drinking water. Specifically, this research aims to produce a fault detection algorithm that can detect the potability of water samples based on Principal Component Analysis (PCA) and entropy-based subset selection methods. This paper addresses the linearity problem that commonly occurred in PCA by finding a subset of data that has a good entropy relation among the parameters contained in the subset, thus maintaining linearity in the data. There were 8 parameters considered in this reseach: pH, hardness, total dissolved solids, chloramines, sulfate, conductivity, organics carbon, trihalomethanes and turbidity. The experiment was conducted with 811 water samples, where 645 samples were used to train the model and the rest for validating the model predictive accuracy. Based on experiments conducted, it is confirmed that the proposed algorithm can determine the potability of drinking water samples from synthetic data sourced from India with an accuracy of over 98% for potable water data and 100% for non-potable water data.
Published Version
Join us for a 30 min session where you can share your feedback and ask us any queries you have