Binning-Based Silhouette Approach to Find the Optimal Cluster Using K-Means

Akash Punhani,Krishna Kumar Mishra,Neetu Faujdar,Manoharan Subramanian

doi:10.1109/access.2022.3215568

Abstract

Clustering is one of the critical parts of machine learning algorithms. K-Means clustering is the standard technique that various data analysts use for clustering the data among the various clusters. Even though the K means clustering algorithm can work effectively, there is a need to tune the value of K according to the dataset under consideration. The process of tuning for the value of k requires the execution of the K-means algorithm with different values of k. The values of k with the best cluster quality based on specific metrics are selected. The elbow method and silhouette coefficient is the most popular approach for selecting the number of clusters. However, both approaches are time-consuming, as they have to execute K-means for each value of k to find a good score. This approach is iterative. This paper proposed a divide and conquer approach that performs the same task in less time. From the proposed approach, the task has been completely 2.5 times faster in comparison to the iterative method at the cost of some memory required to store the results.

Full Text