Abstract
The sheer increase in the size of graph data has created a lot of interest into developing efficient distributed graph processing frameworks. Popular existing frameworks such as GraphLab and Pregel rely on balanced graph partitioning in order to minimize communication and achieve work balance. In this work we contribute to the recent research line of streaming graph partitioning [30], ;31], [34] which computes an approximately balanced k-partitioning of the vertex set of a graph using a single pass over the graph stream using degree-based criteria. This graph partitioning framework is well tailored to processing large-scale and dynamic graphs. In this work we introduce the use of higher length walks for streaming graph partitioning and show that their use incurs a minor computational cost which can significantly improve the quality of the graph partition. We perform an average case analysis of our algorithm using the planted partition model [7], [25]. We complement the recent results of Stanton [30] by showing that our proposed method recovers the true partition with high probability even when the gap of the model tends to zero as the size of the graph grows. Furthermore, among the wide number of choices for the length of the walks we show that the proposed length is optimal. Finally, we perform simulations which indicate that our asymptotic results hold even for small graph sizes.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have