An efficient clustering algorithm for mixed type attributes in large dataset

Jian Yin Jian Yin,Zhi-Fang Tan Zhi-Fang Tan,Yi-Qun Chen Yi-Qun Chen,Jiang-Tao Ren Jiang-Tao Ren

doi:10.1109/icmlc.2005.1527202

Abstract

Clustering is a widely used technique in data mining, at present there exists many clustering algorithms, but most existing clustering algorithms either are limited to handle the single attribute or can handle both data types but are not efficient when clustering large data sets. Few algorithms can do both well. In this article, we propose a clustering algorithm that can handle large datasets with mixed type of attributes. We first use CF*tree (just like CF-tree in BIRCH) to pre-cluster datasets. After that the dense regions are stored in leaf nodes, then we look every dense region as a single point and use the ameliorated k-prototype to cluster such dense regions. Experiment shows that this algorithm is very efficient in clustering large datasets with mixed type of attributes.

Full Text