An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

Liping Jing,Michael K Ng,Joshua Zhexue Huang

doi:10.1109/tkde.2007.1048

Abstract

This paper presents a new k-means type algorithm for clustering high-dimensional objects in sub-spaces. In high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space. For example, in text clustering, clusters of documents of different topics are categorized by different subsets of terms or keywords. The keywords for one cluster may not occur in the documents of other clusters. This is a data sparsity problem faced in clustering high-dimensional data. In the new algorithm, we extend the k-means clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k-means clustering process. An additional step is added to the k-means clustering process to automatically compute the weights of all dimensions in each cluster. The experiments on both synthetic and real data have shown that the new algorithm can generate better clustering results than other subspace clustering algorithms. The new algorithm is also scalable to large data sets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering

Lead the way for us

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: Aug 1, 2007
Citations: 597

Similar Papers

A novel attribute weighting algorithm for clustering high-dimensional categorical data
Liang Bai ... Fuyuan Cao
Pattern Recognition | VOL. 44
Liang Bai, et. al.Liang Bai ... Fuyuan Cao
10 May 2011
Pattern Recognition | VOL. 44

Nonlinear discriminant clustering based on spectral regularization
Yubin Zhan ... Jianping Yin
Neural Computing and Applications | VOL. 22
Yubin Zhan, et. al.Yubin Zhan ... Jianping Yin
19 Apr 2012
Neural Computing and Applications | VOL. 22

Clustering High-Dimensional Stock Data using Data Mining Approach
Dhea Indriyanti ... Arian Dhini
-
Dhea Indriyanti, et. al.Dhea Indriyanti ... Arian Dhini
01 Jul 2019
01 Jul 2019

Clustering of High Dimensional Handwritten Data by an Improved Hypergraph Partition Method
Tian Wang ... Yonggang Lu
-
Tian Wang, et. al.Tian Wang ... Yonggang Lu
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering