Empirical comparison of fast partitioning-based clustering algorithms for large data sets

Chih-Ping Wei,Che-Ming Hsu,Yen-Hsien Lee

doi:10.1016/s0957-4174(02)00185-9

Abstract

Several fast algorithms for clustering very large data sets have been proposed in the literature, including CLARA, CLARANS, GAC-R3, and GAC-RARw. CLARA is a combination of a sampling procedure and the classical PAM algorithm, while CLARANS adopts a serial randomized search strategy to find the optimal set of medoids. GAC-R3 and GAC-RARw exploit genetic search heuristics for solving clustering problems. In this research, we conducted an empirical comparison of these four clustering algorithms over a wide range of data characteristics described by data size, number of clusters, cluster distinctness, cluster asymmetry, and data randomness. According to the experimental results, CLARANS outperforms its counterparts both in clustering quality and execution time when the number of clusters increases, clusters are more closely related, more asymmetric clusters are present, or more random objects exist in the data set. With a specific number of clusters, CLARA can efficiently achieve satisfactory clustering quality when the data size is larger, whereas GAC-R3 and GAC-RARw can achieve satisfactory clustering quality and efficiency when the data size is small, the number of clusters is small, and clusters are more distinct and symmetric.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Empirical comparison of fast partitioning-based clustering algorithms for large data sets

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications

Lead the way for us

Journal: Expert Systems with Applications	Publication Date: Jan 20, 2003
Citations: 48

Similar Papers

Empirical comparison of fast clustering algorithms for large data sets
Chih-Ping Wei ... Che-Ming Hsu
-
Chih-Ping Wei, et. al. Chih-Ping Wei ... Che-Ming Hsu
01 Jan 1999
01 Jan 1999

Fast Dimension-based Partitioning and Merging clustering algorithm
Tamer F Ghanem ... Mohiy M Hadhoud
Applied Soft Computing | VOL. 36
Tamer F Ghanem, et. al.Tamer F Ghanem ... Mohiy M Hadhoud
22 Jul 2015
Applied Soft Computing | VOL. 36

Fast Clustering with Flexible Balance Constraints
Hongfu Liu ... Yun Fu
-
Hongfu Liu, et. al.Hongfu Liu ... Yun Fu
01 Dec 2018
01 Dec 2018

A Fast Incremental Spectral Clustering Algorithm for Image Segmentation
Xiaochun Wang ... Chenyu Chang
-
Xiaochun Wang, et. al.Xiaochun Wang ... Chenyu Chang
01 Dec 2017
01 Dec 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Empirical comparison of fast partitioning-based clustering algorithms for large data sets

Abstract

Talk to us

Similar Papers

More From: Expert Systems with Applications