A streaming graph partitioning approach on imbalance cluster

Yang Cao,Ruonan Rao

doi:10.1109/icact.2016.7423391

Abstract

Distributed graph computing refers to extract knowledge by performing computations on large graphs. If the data source is continuously input like stream, the system is called streaming graph computing. When computing large graphs, a basic and significant step is to distribute the graph over a cluster of nodes, which is called ‘partition’. If the graph isn't partitioned properly, the communication will quickly become a limiting factor in scaling the system, especially in streaming graph computing. And inside some cluster, the CPU speed and memory size of different nodes differs from each other. Observing that in this kind of cluster, nodes those has less resource limit the computing speed, we ask if the partition algorithm could be improved. We propose a simple heuristics to do partition in such cluster and compare the performance of some classic algorithms. It makes less cost of communication more efficient, and make better use of nodes those have more resources. Finally, we evaluate the performance gains in imbalance clusters by using our graph partition method to solve standard PageRank computing on a large real-world World-Wide-Web link graph. It shows that in such circumstance, our heuristics are a significant improvement.

Full Text