Effective runtime scheduling for high-performance graph processing on heterogeneous dataflow architecture

Qingxiang Chen,Qinggang Wang,Xiaofei Liao,Long Zheng,Hai Jin

doi:10.1007/s42514-020-00041-w

Abstract

Graph processing is widely used in modern society, such as social networks, bioinformatics, and information networks. It is observed that the dataflow architecture has been demonstrated to effectively resolve the challenges of low instruction-level parallelism and branch mispredictions in the existing general-purpose architecture for graph applications. In this paper, toward a customized heterogeneous dataflow architecture that integrates the hardware advantages of both dataflow architecture and traditional control architecture, we propose a novel runtime system that can adaptively offload each subgraph to an appropriate underlying architecture. We also present a hybrid execution model to drive optimal performance. Our implementation on a CPU-FPGA platform shows that our approach achieves 2.2x throughput improvement over a state-of-art CPU-FPGA graph processing accelerator and 2.4x throughput improvement over a state-of-art FPGA-based design.

Full Text