Abstract

Large scale graph processing represents an interesting challenge due to the characteristics of the graph structure. Generally, a distributed graph processing framework is a better choice for large graphs with billions of edges. However, traditional iterative computation models like BSP under-perform due to large communication overheads and slow iterative convergence in a distributed environment. Here, we present DistPathGraph, a distributed graph processing framework based on PathGraph. First, considering the difference between single machine and cluster, we describe a novel cluster based partitioning method that is different from PathGraph. Second, due to dependence of vertices and consistency of data replicas in different partitions, we present a scheme to control the order of vertices in the updating procedure. Finally, we design a message-packing strategy that improves communication congestion and the rate of iterative convergence. Generally, synchronous communication model controls the computation and communication steps by barrier, which induces more CPU idle time and communication congestion. Although asynchronous model can eliminate CPU idle in some sense, it may lead to inconsistency of vertices and require complex control logic. Comparing to two models, our strategy is a better compromise. We evaluate our graph processing framework against GraphLab. The experimental results validate that our partitioning method and communication strategy improve performance, and that our graph processing framework outperforms GraphLab by up to 6.53X.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call