G-NMP: Accelerating Graph Neural Networks with DIMM-based Near-Memory Processing

Teng Tian,Xiaotian Wang,Letian Zhao,Wei Wu,Xuecang Zhang,Fangmin Lu,Tianqi Wang,Xi Jin

doi:10.1016/j.sysarc.2022.102602

Teng Tian, Xiaotian Wang + Show 6 more

https://doi.org/10.1016/j.sysarc.2022.102602

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Graph Neural Networks (GNNs) are of great value in numerous applications and promote the development of cognitive intelligence, due to the capability of modeling non-euclidean data structures. However, the inherent irregularity makes GNNs memory-bound, and the hybrid computing paradigm of GNNs poses significant challenges for efficient deployment on existing hardware architectures. Near-Memory Processing (NMP) is a promising solution for alleviating the memory wall problem. In this paper, we present G-NMP, a practical and efficient DIMM-based NMP solution for accelerating GNNs, which accelerates both sparse Aggregation and dense Combination computations on DIMM for the first time. We propose a novel G-NMP hardware architecture to exploit rank-level memory parallelism efficiently, and the G-ISA instructions to reduce host memory requests significantly. We conduct several data flow optimizations on the G-NMP to improve memory-compute overlap and to realize efficient matrix computation. Then we develop an adaptive data allocation strategy for diverse vector sizes to further exploit feature-level parallelism. We also propose a novel memory request scheduling method to achieve flexible and low-overhead DRAM ownership transition between host and G-NMP. Overall, G-NMP achieves consistent performance advantages across diverse GNN models and datasets, and offers 1.46 × overall performance and 1.29 × energy efficiency on average compared with the state-of-the-art work. • G-NMP exploits rank-level parallelism and leverages off-the-shelf CPU and DRAM chips. • G-ISA instruction sets reduces memory requests and alleviates C/A bandwidth pressure. • Data flow optimization improves memory-compute overlap and reduces memory accesses. • Adaptive data allocation ensures memory parallelism for diverse vector sizes. • Propose a flexible and low-overhead memory request scheduling between CPU and G-NMP.

Full Text