Abstract
A common class of graph structural clustering algorithms, pioneered by SCAN (Structural Clustering Algorithm for Networks), not only find clusters among vertices but also classify vertices as cores, hubs and outliers. However, these algorithms suffer from efficiency issues due to the great amount of computation required on structural similarity among vertices. Pruning-based SCAN algorithms improve efficiency by reducing the amount of computation. Nevertheless, this structural similarity computation is still the performance bottleneck, especially on big graphs of billions of edges. In this paper, we propose to parallelize pruning-based SCAN algorithms on multi-core CPUs and Intel Xeon Phi Processors (KNL) with multiple threads and vectorized instructions. Specifically, we design ppSCAN, a multi-phase vertex computation based parallel algorithm, to avoid redundant computation and achieve scalability. Moreover, we propose a pivot-based vectorized set intersection algorithm for structural similarity computation. Experimental results show that ppSCAN is scalable on both CPU and KNL with respect to the number of threads. On the 1.8 billion-edge graph friendster, our ppSCAN completes within 65 seconds on KNL (64 physical cores with hyper-threading). This performance is 100x-130x faster than our single-threaded version, and up to 250x faster than pSCAN, the state-of-the-art sequential algorithm, on the same platform.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have