Abstract

Although k-means and its variants are known for their remarkable efficiency, they suffer from a strong dependence on the prior knowledge of K and the assumption of a circle-like pattern, which can result in the algorithms dividing the input space instead of discovering non-predetermined data patterns. Thus, we propose beyond k-means++ that infers and utilizes explicit clusters by emphasizing local geometrical information for better cluster exploration. To avoid the K dependence, a novel framework of iterative division and aggregation (IDA) over k-means++ is presented. It begins with any K≥1, then increases and reduces K along with the procedure of clusters’ division and aggregation, respectively. To break through the circle-like pattern limitation, we introduce a reasonability checking strategy (RCS) for cluster division. Given local geometrical information, RCS achieves arbitrary cluster shape support by rejecting edge patterns with distinguished convergence direction and merging adjacent clusters with pseudo-edge patterns. Furthermore, we design an edge shrinkage strategy (ESS). Taking edge patterns as the cluster prototype, it benefits accuracy by effectively avoiding representability reduction due to irregular distribution. To compensate for the loss of efficiency, a near maximin and random sampling algorithm is suggested for large-scale data with high dimensionality. Experimental results confirm that beyond k-means++ is featured by handling arbitrary cluster shapes with remarkable accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call