Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters.

Hong Jia,Yiu-Ming Cheung

doi:10.1109/tnnls.2017.2728138

Abstract

In clustering analysis, data attributes may have different contributions to the detection of various clusters. To solve this problem, the subspace clustering technique has been developed, which aims at grouping the data objects into clusters based on the subsets of attributes rather than the entire data space. However, the most existing subspace clustering methods are only applicable to either numerical or categorical data, but not both. This paper, therefore, studies the soft subspace clustering of data with both of the numerical and categorical attributes (also simply called mixed data for short). Specifically, an attribute-weighted clustering model based on the definition of object-cluster similarity is presented. Accordingly, a unified weighting scheme for the numerical and categorical attributes is proposed, which quantifies the attribute-to-cluster contribution by taking into account both of intercluster difference and intracluster similarity. Moreover, a rival penalized competitive learning mechanism is further introduced into the proposed soft subspace clustering algorithm so that the subspace cluster structure as well as the most appropriate number of clusters can be learned simultaneously in a single learning paradigm. In addition, an initialization-oriented method is also presented, which can effectively improve the stability and accuracy of -means-type clustering methods on numerical, categorical, and mixed data. The experimental results on different benchmark data sets show the efficacy of the proposed approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems

Lead the way for us

Journal: IEEE Transactions on Neural Networks and Learning Systems	Publication Date: Aug 3, 2017
Citations: 127

Similar Papers

A Unified Metric for Categorical and Numerical Attributes in Data Clustering
Yiu-Ming Cheung ... Hong Jia
-
Yiu-Ming Cheung, et. al.Yiu-Ming Cheung ... Hong Jia
01 Jan 2013
01 Jan 2013

Clustering of Mixed Data by Integrating Fuzzy, Probabilistic, and Collaborative Clustering Framework
Arkanath Pathak ... Nikhil R Pal
International Journal of Fuzzy Systems | VOL. 18
Arkanath Pathak, et. al.Arkanath Pathak ... Nikhil R Pal
02 Apr 2016
International Journal of Fuzzy Systems | VOL. 18

MMDBC: Density-Based Clustering Algorithm for Mixed Attributes and Multi-dimension Data
Haizhou Du ... Shuqing Zeng
-
Haizhou Du, et. al.Haizhou Du ... Shuqing Zeng
01 Jan 2018
01 Jan 2018

Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number
Yiu-Ming Cheung ... Hong Jia
Pattern Recognition | VOL. 46
Yiu-Ming Cheung, et. al.Yiu-Ming Cheung ... Hong Jia
31 Jan 2013
Pattern Recognition | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems