High-density cluster core-based [formula omitted]-means clustering with an unknown number of clusters

Abhimanyu Kumar,Abhishek Kumar,Rammohan Mallipeddi,Dong-Gyu Lee

doi:10.1016/j.asoc.2024.111419

Abstract

The k-means algorithm, known for its simplicity and adaptability, faces challenges related to manual cluster number selection and sensitivity to initial centroid placement. This paper introduces an innovative framework aimed at overcoming these challenges. By proposing a data-driven cluster number estimation method and a robust initialization strategy based on high-density cluster cores, our approach revolutionizes k-means, unlocking its full unsupervised potential and ensuring superior performance, even in scenarios involving overlapping clusters. The method employs a novel density-based technique to accurately identify cluster cores, resulting in substantial improvements over existing methods. Rigorous experimentation on synthetic and real-world datasets demonstrates an average performance enhancement of 15% in terms of the Adjusted Rand Index for datasets with overlapping clusters, surpassing the capabilities of state-of-the-art density-based clustering methods and traditional k-means. Moreover, our method autonomously determines the optimal number of clusters, facilitating true unsupervised learning and eliminating the impact of initial centroid placement on clustering outcomes. This leads to stable and consistent results, addressing key limitations of the conventional k-means algorithm. The practical applicability of our approach is exemplified in image segmentation tasks, showcasing its versatility and reliability in real-world scenarios.

Full Text