Abstract
Each process in classifying data into several clusters or grouping so that the data in one cluster has a maximum similarity level and between clusters has a minimum similarity is called clustering. Clustering is divided into 2 approaches in its development, namely the partitioning and hierarchical approach to clustering[1]. The Water Quality Status dataset has 8 attributes, 4 classes and 120 instances, Class distribution is good condition (30 instances), lightly polluted (30 instances), medium polluted (30 instances) and heavily polluted (30 instances). 70% of the data will be used as training data and 30% of the data will be used as randomized test data. The simplify the process of completing the performance calculation of the clustering model, the research implementation was carried out using the MATLAB function. That iteration is carried out with the number of clusters generated from 100 to 2,500 iterations with the results of the number of clusters as many as 10 clusters. In the experiment, iteration amounted to 5000 and there was a change in the results of the number of clusters by 9 clusters. After re-testing using the number of iterations of 10,000-50,000 iterations, but the number of clusters produced did not change anything at all. So that the conclusion in testing the AP method produces the most optimal number of clusters of 10 clusters.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have