Efficient approaches for summarizing subspace clusters into k representatives

Guanhua Chen,Dongqing Yang,Xiuli Ma,Kunqing Xie,Meng Shuai,Shiwei Tang

doi:10.1007/s00500-010-0552-8

Abstract

A major challenge in subspace clustering is that subspace clustering may generate an explosive number of clusters with high computational complexity, which severely restricts the usage of subspace clustering. The problem gets even worse with the increase of the data’s dimensionality. In this paper, we propose to summarize the set of subspace clusters into k representative clusters to alleviate the problem. Typically, subspace clusters can be clustered further into k groups, and the set of representative clusters can be selected from each group. In such a way, only the most representative subspace clusters will be returned to user. Unfortunately, when the size of the set of representative clusters is specified, the problem of finding the optimal set is NP-hard. To solve this problem efficiently, we present two approximate methods: PCoC and HCoC. The greatest advantage of our methods is that we only need a subset of subspace clusters as the input instead of the complete set of subspace clusters. Precisely, only the clusters in low-dimensional subspaces are computed and assembled into representative clusters in high-dimensional subspaces. The approximate results can be found in polynomial time. Our performance study shows both the effectiveness and efficiency of these methods.

Full Text