About Some Data Precaution Techniques For K-Means Clustering Algorithm

Muazu Zulkifilu,Abdulkadir Yasir

doi:10.56919/usci.1122.003

Muazu Zulkifilu, Abdulkadir Yasir

Open Access

PDF Available

https://doi.org/10.56919/usci.1122.003

Copy DOI

Export

Save

Cite

Journal: UMYU Scientifica	Publication Date: Sep 30, 2022
License type: CC BY-NC 4.0

Affiliation: Umaru Musa Yar'adua University

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Clustering is a technique of creating groups of objects such that each group contains similar and unique objects. One of the most popular clustering techniques is the k-means clustering algorithm. Conventional k-means techniques may not work well for high-dimensional datasets, due to the noise, discrepancies, and outliers associated with the original dataset. However, some form of transformation is required to organize the data for clustering. Four different data pre-processing methods are applied before the clustering algorithm to make the data clean, noise-free and consistent. The impact of data pre-processing on the basic k-means clustering algorithm was tested on real-life data using some normalization techniques such as z-score, mean-max, decimal scaling, and mean absolute deviation. We find that the pre-processing before clustering yields good clustering results and significantly reduces the running time compared to the traditional techniques. We can also conclude that the mean absolute deviation is the best among the four normalization methods as it captures all clustering points.

Full Text