Abstract

How can we find all connected components in an enormous graph with billions of nodes and edges?Finding connected components is a fundamental operation for various graph computation tasks such as pattern recognition, reachability, graph compression, etc. Many algorithms have been proposed for decades, but most of them are not scalable enough to process recent web scale graphs. Recently, a MapReduce algorithm was proposed to handle such large graphs. However, the algorithm repeatedly reads and writes numerous intermediate data that cause network overload and prolong the running time. In this paper, we propose PACC (Partition-Aware Connected Components), a new distributed algorithm based on graph partitioning for load-balancing and edge-filtering. Experimental results show that PACC significantly reduces the intermediate data, and provides up to 10 times faster performance than the current state-of-the-art MapReduce algorithm on real world graphs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call