Rock: A robust clustering algorithm for categorical attributes

Sudipto Guha,Rajeev Rastogi,Kyuseok Shim

doi:10.1016/s0306-4379(00)00022-3

Abstract

Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions. In this paper, we study clustering algorithms for data with boolean and categorical attributes. We show that traditional clustering algorithms that use distances between points for clustering are not appropriate for boolean and categorical attributes. Instead, we propose a novel concept of links to measure the similarity/proximity between a pair of data points. We develop a robust hierarchical clustering algorithm ROCK that employs links and not distances when merging clusters. Our methods naturally extend to non-metric similarity measures that are relevant in situations where a domain expert/similarity table is the only source of knowledge. In addition to presenting detailed complexity results for ROCK, we also conduct an experimental study with real-life as well as synthetic data sets to demonstrate the effectiveness of our techniques. For data with categorical attributes, our findings indicate that ROCK not only generates better quality clusters than traditional algorithms, but it also exhibits good scalability properties.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Rock: A robust clustering algorithm for categorical attributes

Abstract

Talk to us

Similar Papers

More From: Information Systems

Lead the way for us

Journal: Information Systems	Publication Date: Jul 1, 2000
Citations: 1155

Similar Papers

ROCK: a robust clustering algorithm for categorical attributes
S Guha ... K Shim
-
S Guha, et. al.S Guha ... K Shim
01 Jan 1998
01 Jan 1998

Extraction of Meaningful Rules in a Medical Database
Sang C Suh ... Sam Saffer
-
Sang C Suh, et. al.Sang C Suh ... Sam Saffer
01 Jan 2008
01 Jan 2008

A robust and scalable clustering algorithm for mixed type attributes in large database environment
Tom Chiu ... Yao Wang
-
Tom Chiu, et. al.Tom Chiu ... Yao Wang
26 Aug 2001
26 Aug 2001

An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood
Shifei Ding ... Yu Xue
Knowledge-Based Systems | VOL. 133
Shifei Ding, et. al.Shifei Ding ... Yu Xue
21 Jul 2017
Knowledge-Based Systems | VOL. 133

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Rock: A robust clustering algorithm for categorical attributes

Abstract

Talk to us

Similar Papers

More From: Information Systems