Abstract

Data clustering is one of the methods in data science that is often used in data analysis. This method is used in making groupings from a collection of datasheets. Data clustering is done to find patterns or relationships between data. This research aims to evaluate the accuracy of data clustering using K-Means algorithm on wine datasheet. Wine datasheet has 13 features that describe the chemical characteristics of three types of wine. The clustering process must produce the best clustering evaluation metrics. The evaluation metric is done through comparison between the clustering results of K-Means algorithm with Davies Bouldin and Silhouette. The research steps involved data standardization, selection of the optimal number of clusters, and assessment of clustering accuracy. The research method uses KDD which consists of pre-processing, transformation, model building and model evaluation. Experimental results show that appropriate parameters and cluster initialization can improve clustering evaluation metrics. The clustering results show that the normalized datasheet produces evaluation metrics for Davies Bouldin 2 groups and Silhouette produces 3 groups. Before normalization, Davies Boulidin results in 7 groups and Silhouette results in 2 groups. In conclusion, this study produced different evaluation metrics between normalized and non-normalized datasheets. The selection of the number of groups chosen depends on the context of the data analysis performed and is selected into 3 groups which can be labelled "Superior Variety", the second group "Intermediate Variety" and the third group "Standard Variety".

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call