The Study on the Accuracy of Classifiers for Water Quality Application

Rosaida Rosly,Mokhairi Makhtar,Mustafa Mat Deris,M Nordin A Rahman,Mohd Khalid Awang

doi:10.14257/ijunesst.2015.8.3.13

Abstract

Dirty water is the world's biggest health risk. When water from rain roads into rivers, it picks up toxic chemicals, dirt, trash and disease-carrying organisms along the way. Many of our water resources lack basic protections, making them vulnerable to pollution from factory farms and industrial plants. Due to that, a classification model is needed to present the quality of the water environment. In this paper, the data mining techniques are used in this research by applying the classification method for water quality application. Various classifiers were studied in order to find the most accurate classifier for the dataset. This paper presents the comparison of accuracies for the five classifiers (NB, MLP, J48, SMO, and IBk) based on a 10-fold cross validation as a test method with respect to water quality from the datasets of Kinta River, Perak Malaysia. This study also explores which classifier is suitable to classify the dataset. The selected attributes used in this study were: DO Sat, DO Mgl, BOD Mgl, COD Mgl, TS Mgl, DO Index, AN Index, SS Index, Class, and Degree of pollution. The data consisted of 166 instances and obtained from the East Coast Environmental Research Institute (ESERI) of Universiti Sultan Zainal Abidin (UniSZA). The result of MLP and IBk performed better than other classifiers for Kinta River dataset because these classifiers showed the highest accuracy with the same percentage of 91.57%. In the future, we will propose the multiclassifier approach by introducing a fusion at a classification level between these classifiers to get a higher accuracy of classification.

Full Text