An improved density peak clustering algorithm guided by pseudo labels

Yizhang Wang,Wei Pang,Jingchu Zhou

doi:10.1016/j.knosys.2022.109374

Abstract

Density peak clustering algorithms and their variants have achieved promising results in many fields over the last few years. However, most of these algorithms parameters requiring to be fine-tuned by users. When facing real-world data without ground-truths, it is often challenging and time-consuming to identify better parameter values for parametric clustering algorithms. Considering this, we propose a density peak clustering algorithm guided by pseudo labels (PLDPC), in which the manually pre-specified parameters are avoided through applying the mutual information criterion. Specifically, we first design a novel pseudo-label generation method based on the theory of co-occurrence. Then, we use the maximizing mutual information method to obtain better clustering results. To evaluate the effectiveness of the proposed PLDPC algorithm, we conduct extensive experiments on 23 datasets, including six synthetic and seventeen real-world datasets. The experimental results show that PLDPC outperforms three classical algorithms (i.e., K-means, DPC, and DBSCAN) and eight state-of-the-art (SOTA) clustering algorithms in most cases.

Full Text