Fargraph+: Excavating the parallelism of graph processing workload on RDMA-based far memory system

Jing Wang,Chao Li,Yibo Liu,Taolei Wang,Junyi Mei,Lu Zhang,Pengyu Wang,Minyi Guo

doi:10.1016/j.jpdc.2023.02.015

Abstract

Disaggregated architecture brings new opportunities to memory-consuming applications like graph processing. It allows one to outspread memory access pressure from local to far memory, providing an attractive alternative to disk-based processing. Although existing works on general-purpose far memory platforms show great potentials for application expansion, it is unclear how graph processing applications could benefit from disaggregated architecture, and how different optimization methods influence the overall performance.In this paper, we take the first step to analyze the impact of graph processing workload on disaggregated architecture by extending the GridGraph framework on top of the RDMA-based far memory system. We propose Fargraph+, a system with parallel graph data offloading and far memory coordination strategy for enhancing efficiency of graph processing workload on RDMA-based far memory architecture. Specifically, Fargraph+ reduces the overall data movement through a well-crafted, graph-aware data segment offloading mechanism. In addition, we use optimal data segment splitting and asynchronous data buffering to achieve graph iteration-friendly far memory access. We further configure efficient parallelism-oriented control to accelerate performance of multi-threading processing on graph iterations while improving memory efficiency of far memory access by utilizing RDMA queue features. We show that Fargraph+ achieves near-oracle performance for typical in-local-memory graph processing systems. Fargraph+ shows up to 11.2× speedup compared to Fastswap, the state-of-the-art, general-purpose far memory platform.

Full Text