Abstract

[Formula: see text]-medoids clustering is a popular variant of [Formula: see text]-means clustering and widely used in pattern recognition and machine learning. A main drawback of [Formula: see text]-medoids clustering is that an improper initialization can cause it to get trapped in local optima. An improved [Formula: see text]-medoids clustering algorithm, called INCKM algorithm, which is the first to apply incremental initialization to [Formula: see text]-medoids clustering, was recently proposed to overcome this drawback. The INCKM algorithm requires the construction of a subset of candidate medoids determined by one hyperparameter for initialization, and meanwhile, it always fails when dealing with imbalanced datasets with an incorrect hyperparameter selection. In this paper, we propose a novel [Formula: see text]-medoids clustering algorithm, called incremental [Formula: see text]-means++ (INCKPP) algorithm, which initializes with a novel incremental manner, attempting to optimally add one new cluster center at each stage through a non-parametric and stochastic [Formula: see text]-means++ initialization. The INCKPP algorithm overcomes the difficulty of hyperparameter selection in the INCKM algorithm, improves the clustering performance, and can deal with imbalanced datasets well. However, the INCKPP algorithm is not computationally efficient enough. To deal with this, we further propose an improved INCKPP algorithm, called INCKPP[Formula: see text] algorithm which improves the clustering efficiency while maintaining the clustering performance of the INCKPP algorithm. Extensive results from experiments on both synthetic and real-world datasets, including imbalanced datasets, illustrate that the proposed algorithms outperforms than the other compared algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call