Abstract

Abstract-Density Peaks (DP) is a recently proposed clustering algorithm that has distinctive advantages over existing clustering algorithms. It has already been used in a wide range of applications. However, DP requires computing the distance between every pair of input points, therefore incurring quadratic computation overhead, which is prohibitive for large data sets. In this paper, we propose an efficient distributed algorithm LSHDDP, which is an approximate algorithm that exploits Locality Sensitive Hashing. We present formal analysis of LSH-DDP, and show that the approximation quality and the runtime can be controlled by tuning the parameters of LSH-DDP. Experimental results on both a local cluster and EC2 show that LSH-DDP achieves a factor of 1.7-70x speedup over the na¨ive distributed DP implementation and 2x speedup over the state-of-the-art EDDPC approach, while returning comparable cluster results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call