Abstract

Multi-FPGA architectures have gained great interests in accelerating large-scale graph processing with great potential on high throughput and energy efficiency. As a beneficial complement, work stealing functions effectively to balance the computational workload on different FPGAs dynamically. Unfortunately, existing graph partitioning schemes originally designed in distributed settings potentially mismatch the work stealing-enabled multi-FPGA situations, where the computation is balanced while the communication overhead is unprecedentedly significant. In this paper, we present a 2-dimension balanced graph partitioning for work stealing assisted graph systems on multi-FPGAs, which can reduce communication overhead while preserving the optimal performance of work stealing. Our approach is novel by 1) exploring the tradeoff between load balance dimension and communication dimension in work-stealing-enabled graph processing system for the optimal performance, and 2) optimizing the memory access sequences to improve the granularity of graph partitioning for high-throughput graph analytics. Our experimental results show that our approach achieves 1.63x <inline-formula><tex-math notation="LaTeX">$\sim$</tex-math></inline-formula> 2.56x speedups compared with state-of-the-art FPGA-based graph processing systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call