Application of ensemble methods for classification of water quality

Mohamad Sakizadeh

doi:10.1504/ijw.2017.10004524

Abstract

Groundwater pollution in Shoosh Aquifer located in Khuzestan Province, Iran, was considered, using an eight years time period data set collected from 30 sampling wells. Cluster analysis rendered a dendrogram where 30 sampling wells were grouped into three statistically significant clusters. The classification methods, k-nearest neighbour and classification tree, were utilised to classify sampling stations, with respect to the level of pollution. The optimum tree depth and number of neighbours were determined by 4-fold misclassification error which both had an error of 0.167. An ensemble was created using these base classifiers. In addition, considering the small sample size of our data in this study, random subspace as a feature selection method was amalgamated with k-nearest neighbour ensemble. The misclassification errors of classification tree and k-nearest neighbour ensembles were 0.13 and 0.10, respectively. The results of this study confirmed the high accuracy of ensemble methods for data classification.

Full Text