QuickDSC: Clustering by Quick Density Subgraph Estimation

Xichen Zheng,Chengsen Ren,Yiyang Yang,Zhiguo Gong,Xiang Chen,Zhifeng Hao

doi:10.1016/j.ins.2021.09.048

Abstract

Density-based clustering is a traditional research topic with the capability of determining clusters of arbitrary shapes. Besides, through the Density Estimator (DE), density-based methods such as MeanShift, and QuickShift can find the local density maximums as Modes that are excellent representatives of the clusters. However, concentrating on the modes only may suffer from the over-segmentation problem. On the other hand, most density-based methods cannot satisfy the scenario requiring partitioning the data samples into exactly K clusters. To overcome these issues, QuickDSC: a novel and efficient clustering algorithm that groups the samples through the Quick Density Subgraph Estimation, is proposed in this work. It firstly identifies the high-density-connected samples as the Density Subgraphs (DSs). And then, the importance of DSs is estimated from two aspects: density and geometric weight. The top-K important DSs are selected as the cluster centers and based on which the cluster memberships of remaining samples are determined.QuickDSC incorporates three crucial clustering attributes: (1) the cluster centroids are modes (as in density-based methods); (2) able to efficiently return results by utilizing the underlying density structure (as in hierarchical clustering methods); and (3) it explicitly returns K clusters (e.g., K-Means, K-Modes). In addition, QuickDSC is theoretically and empirically efficient. It is only slightly slower than classical clustering methods such as K-Means and DBSCAN. Experiments on artificial and real-world datasets demonstrate the advantages of the proposed method, and the clustering quality outperforms the state-of-the-art approaches.

Full Text