Existing graph processing essentially relies on the underlying iterative execution with synchronous (Sync) and/or asynchronous (Async) engine. Nevertheless, they both suffer from a wide class of inherent serialization arising from data interdependencies within a graph. In this article, we present SymGraph, a judicious graph engine with symbolic iteration that enables the parallelism of dependent computation on vertices. SymGraph allows using abstract symbolic value (instead of the concrete value) for the computation if the desired data is unavailable. To maximize the potential of symbolic iteration, we propose a chain of tailored sophisticated techniques, enabling SymGraph to scale out with a new milestone of efficiency for large-scale graph processing. We evaluate SymGraph in comparison to Sync, Async, and a hybrid of Sync and Async engines. Our results on 12 nodes show that SymGraph outperforms all three graph engines by 1.93x (vs. Sync), 1.98x (vs. Async), and 1.57x (vs. Hybrid) on average. In particular, the performance for PageRank on 32 nodes can be dramatically improved by 16.5x (vs. Sync), 23.3x (vs. Async), and 12.1x (vs. Hybrid), respectively. The efficiency of SymGraph is also validated with at least one order of magnitude improvement in contrast to three specialized graph systems (Naiad, GraphX, and PGX.D).
Read full abstract