Abstract

A connected component in a graph is a set of nodes linked to each other by paths. The problem of finding connected components has been applied to diverse graph analysis tasks such as graph partitioning, graph compression, and pattern recognition. Several distributed algorithms have been proposed to find connected components in enormous graphs. Ironically, the distributed algorithms do not scale enough due to unnecessary data IO & processing, massive intermediate data, numerous rounds of computations, and load balancing issues. In this paper, we propose a fast and scalable distributed algorithm PACC (Partition-Aware Connected Components) for connected component computation based on three key techniques: two-step processing of partitioning & computation, edge filtering, and sketching. PACC considerably shrinks the size of intermediate data, the size of input graph, and the number of rounds without suffering from load balancing issues. PACC performs 2.9 to 10.7 times faster on real-world graphs compared to the state-of-the-art MapReduce and Spark algorithms.

Highlights

  • A connected component in a graph is a set of nodes linked to each other by paths

  • PACC, with or without edge filtering and sketching, shows the best load balancing for every iteration

  • The running time of PACC with edge filtering and sketching decreases drastically as the graph size decreases with each iteration

Read more

Summary

Introduction

A connected component in a graph is a set of nodes linked to each other by paths. Finding connected components is a fundamental graph mining task having various applications including reachability [1, 2], pattern recognition [3, 4], graph partitioning [5, 6], random walk [7], graph compression [8, 9], etc. We propose PACC (Partition-Aware Connected Components), a fast, scalable, and distributed algorithm for computing connected components. PACC achieves high performance and scalability by three techniques: two-step processing (partitioning and computation), edge filtering, and sketching. The sketching reduces the size of the input graph by performing a sequential connected component algorithm on each subgraph of the input graph. We propose PACC, a fast and scalable algorithm for connected component computation in an enormous graph. PACC is made up of three key techniques: two-step processing (partitioning and computation), edge filtering, and sketching. The techniques make PACC distribute workloads evenly, shrink the size of input and intermediate data, and reduce the round number. This paper is an extended version of [12]; in this paper, we newly propose a sketching technique that improves the performance of the previously proposed method (namely PACC-ef) by reducing the input data size.

Related work
Experiments
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.