CNAK: Cluster number assisted K-means

Jayasree Saha,Jayanta Mukherjee

doi:10.1016/j.patcog.2020.107625

Abstract

The K-means clustering algorithm is well-known for its easy computational approach. In this algorithm, essential cluster-level information is captured by the K cluster centroids. However, how many such centroids can reveal the structure of the underlying data depends upon the choice of K. In this paper, we propose a clustering algorithm in which the number of cluster K can be learned as well as it performs the clustering. Our work revolves around two observations: i) a large-sized random sampled dataset may have a similar distribution as the original data, and ii) for the true number of clusters their centroids, generated from a sampled datasets, approximate the cluster centroids generated from the original dataset. The first observation has paved the way to provide a scalable solution, and the second one forms the key aspect of building the proposed algorithm. We have tested our method on several real and synthetic datasets. Our method can solve a few pertinent issues of clustering a dataset: 1) detection of a single cluster in the absence of any other cluster in a dataset, 2) the presence of hierarchy, 3) clustering of a high dimensional dataset, 4) robustness over dataset having cluster imbalance, and 5) robustness to noise. We have observed significant improvement in speed and quality for predicting cluster numbers as well as the composition of clusters in a large dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CNAK: Cluster number assisted K-means

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Journal: Pattern Recognition	Publication Date: Aug 30, 2020
Citations: 75

Similar Papers

Impacts of Data Synthesis: A Metric for Quantifiable Data Standards and Performances
Gunjan Chandra ... Riitta Veijola
Data | VOL. 7
Gunjan Chandra, et. al.Gunjan Chandra ... Riitta Veijola
11 Dec 2022
Data | VOL. 7

Model-based clustering and data transformations for gene expression data.
K Y Yeung ... A E Raftery
Bioinformatics | VOL. 17
K Y Yeung, et. al.K Y Yeung ... A E Raftery
01 Oct 2001
Bioinformatics | VOL. 17

Efficient biased sampling for approximate clustering and outlier detection in large data sets
G Kollios ... N Koudas
IEEE Transactions on Knowledge and Data Engineering | VOL. 15
G Kollios, et. al.G Kollios ... N Koudas
01 Sep 2003
IEEE Transactions on Knowledge and Data Engineering | VOL. 15

A learning automata‐based clustering algorithm using ant swarm intelligence
Babak Anari ... Amir Masoud Rahmani
Expert Systems | VOL. 35
Babak Anari, et. al.Babak Anari ... Amir Masoud Rahmani
24 Jul 2018
Expert Systems | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CNAK: Cluster number assisted K-means

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition