Abstract

In high dimensional data space, clusters are likely to exist in different subspaces. K-means is a classic clustering algorithm, but it cannot be used to find subspace clusters. In this paper, an algorithm called GKM is designed to generalize k-means algorithm for high dimensional data. In the objective function of GKM, we associate a weight vector with each cluster to indicate which dimensions are relevant to this cluster. To prevent the value of the objective function from decreasing because of the elimination of dimensions, virtual dimensions are added to the objective function. The values of data points on virtual dimensions are set artificially to ensure that the objective function is minimized when the real subspace clusters or the clusters in original space are found. Algorithm GKM preserves the advantages of k-means. It can identify subspace clusters with linear time complexity. Our performance study with a synthetic dataset and a real dataset demonstrates the efficiency and effectiveness of GKM.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call