Abstract

Density Peak Clustering (DPC) is a highly effective density-based clustering algorithm, but its scalability is limited by the expensive Density Peak Estimation (DPE) step. To address this challenge, we propose UP-DPC: Ultra-Scalable Parallel Density Peak Clustering, a novel framework that employs approximate Density Peak Estimation and performs DPC on LDP-wise graphs. This approach enables UP-DPC to handle datasets of arbitrary scale without relying on spatial indexing for acceleration. Furthermore, we introduce a five-layer computational architecture and leverage parallel computation techniques to further enhance the speed and efficiency of UP-DPC.To evaluate the scalability and effectiveness of UP-DPC, we conduct extensive experiments on 14 datasets, including the large/web-scale datasets, and compare UP-DPC with 21 algorithms. Notably, on the MNIST8M dataset consisting of 8,000k data objects, UP-DPC achieves an NMI (Normalized Mutual Information) value of 0.6464 in just 35.41 seconds, outperforming the state-of-the-art GPU-based method, which only archives an NMI of 0.045 in 56.96 seconds. These results demonstrate the superior scalability and effectiveness of UP-DPC in handling large/web-scale datasets. The proposed framework offers significant improvements over existing methods and shows promise as a solution for density-based clustering tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call