Accelerating Graph Convolutional Networks Through a PIM-Accelerated Approach

Hai Jin,Jin Zhao,Dan Chen,Wenbin Jiang,Long Zheng,Xiaofei Liao,Yu Huang,Pengcheng Yao

doi:10.1109/tc.2023.3257514

Hai Jin, Jin Zhao + Show 6 more

Open Access

https://doi.org/10.1109/tc.2023.3257514

Copy DOI

Abstract

<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Graph convolutional networks</i> (GCNs) are promising to enable machine learning on graph data. GCNs show potential vertex-level and intra-vertex parallelism for GPU acceleration, but their irregular memory accesses arising in aggregation operations and the inherent sparsity for vertex features of graphs cause inefficiencies on the GPU. In this paper, we present gPIM, which aims to accelerate GCNs inference through a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">processing-in-memory</i> (PIM) enabled architecture. gPIM is expected to perform compute-intensive combination on the GPU while aggregation and memory-bound combination are offloaded to the PIM-featured <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">hybrid memory cubes</i> (HMCs). To maximize the efficiency of such GPU-HMC architecture, gPIM is novel with two key designs: 1) A GCN-induced graph partitioning that minimizes communication overheads between cubes, 2) A programmer-transparent performance estimation mechanism that predicts the performance bound of operations accurately for workload offloading. Experimental results show that gPIM significantly outperforms Intel Xeon E5-2680v3 CPU (8,979.52×), NVIDIA Tesla V100 GPU (96.01×), and a state-of-the-art GCN accelerator AWB-GCN (4.18×).

Full Text