Abstract

The well-known Density Peak Clustering algorithm (DPC) proposed a heuristic center detection idea, i.e., to find density peaks as cluster centers. Nevertheless, such a center detection idea cannot work well on multi-peak clusters of complex shapes. Besides, DPC needs the distances between data, making it prohibitively time-consuming. To overcome these problems, a Main Density Peak Clustering algorithm (MDPC+)—clustering by fast detection of main density peaks within a peak digraph—is proposed, where a main density peak is the highest density peak in a cluster. MDPC+ can easily detect the real centers of multi-peak clusters based on its new center assumption. In MDPC+, the clustering problem is viewed as a graph cut problem and a specific graph structure is designed for non-peak and density peak allocation, respectively, so it can reasonably reconstruct clusters of complex shapes. Meanwhile, a satellite peak attenuation technique is embedded into MDPC+ to give it a high resistance to the interference of satellite peaks (i.e., non-center density peaks). Besides, MDPC+ only needs kNN distances of data as its input, so it is suitable for large datasets. Experimental results on both synthetic and real-world datasets demonstrate the superiority of MDPC+ in center detection, complex shape reconstruction, and running speed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call