Cluster and Outlier Analysis for Ground Water Quality Data in the Regions of Kadapa District in Andhra Pradesh

S.V.S Ganga Devi

doi:10.2174/1872212113666190211144935

Abstract

Background: Patents suggest that groundwater contaminated with chemicals, bacteria, oils or gases etc. leads to many types of diseases in people. Fresh and clean water plays a significant role in human life. In this study, water samples were collected from different regions of the Kadapa district, Andhra Pradesh. Methods: Water samples were collected in plastic bottles with a tight cap washed with distilled water. Totally, 57 samples were collected and analyzed in the laboratory for physicochemical properties like EC (Electrical Conductivity), pH, TH (Total Hardness), Total Dissolved Solids (TDS),Ca, Cl and F. In this paper, K-means clustering, K-Mediods clustering and Hierarchical clustering methods are used to group the collected regions of water samples based on the water quality. Later outlier analysis was carried out and various interesting patterns were identified. Results: According to the WQI values calculated, all the collected samples were suitable for drinking purpose. According to WQI values calculation, for the collected water sample data, it contained 13 poor tuples, 13 good tuples and 31 excellent tuples. According to K-means clustering, 3 clusters were observed with sizes 8, 17, 32. According to Outlier analysis, the samples from region Pullareddypet (sample No. 7) had the highest EC, TH and TDS values among the 57 collected water samples. The samples from region Veerapalli (Sample No. 37) had the highest fluoride value 3.58 among all 57 samples collected. Conclusion: Unsupervised learning methods such as K-Means Clustering, K-Mediods clustering and Hierarchical clustering methods are described for collecting data regarding the collected water samples’ physico-chemical parameters. The cluster analysis results were compared with WQI values calculated. The three clusters overlapped with each other with a small degree. In the study area, for drinking purpose, only excellent, good, poor category tuples were found. Later, outlier analysis has been described using Box plot method and K-means clustering method. By using outlier analysis using K-means clustering, various interesting hidden patterns from the data were extracted.

Full Text