GraphA: Efficient Partitioning and Storage for Distributed Graph Computation

Yiming Zhang,Chengfei Zhang,Dongsheng Li,Ling Liu,Jinyan Wang

doi:10.1109/tsc.2017.2778737

Abstract

Distributed graph computation is central to applications ranging from language processing to social networks. However, natural graphs tend to have skewed power-law distributions where a small subset of the vertices have a large number of neighbors. Existing graph-parallel systems suffer from load imbalance, high communication cost, and inefficient processing. To address this problem, in this paper we present GraphA, an a daptive scheme for efficient large-scale graph computation. At the core of GraphA is an adaptive and uniform graph partitioning algorithm, which partitions the datasets by using an incremental number of mapping functions. GraphA further improves and leverages the ART index structure to realize fine-grained and low-cost graph storage. We have implemented GraphA both on Spark and on GraphLab. Extensive evaluation shows that GraphA significantly outperforms state-of-the-art graph-parallel systems (GraphX and PowerLyra) in ingress time, execution time and storage cost, for both real-world and synthetic graphs. GraphA achieves up to $7.1\times$ 7 . 1 × performance improvement over GraphX and 19.7 percent improvement over PowerLyra.

Full Text