Abstract

Affinity Propagation (AP) algorithm is a useful clustering technique with a lot of noteworthy advantages. It has been successfully applied in many applications. However, this algorithm does not scale for large scale data sets because it requires quadratic computational time and memory usage in the problem size. In this paper, we concentrate on the needs of big data analytics and propose an effective and efficient scheme to decrease the computational complexity and memory usage of AP algorithm. The basic idea of our approach is embedding data points in distance-preserving binary codes and then decomposing the original big data set into a series of small subsets by aggregating similar data points according to their binary codes. The experimental results and the real world astronomical spectral data application demonstrate the effectiveness of our approach quantitatively and visually.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call