Abstract

<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Graph convolutional networks</i> (GCNs) are promising to enable machine learning on graph data. GCNs show potential vertex-level and intra-vertex parallelism for GPU acceleration, but their irregular memory accesses arising in aggregation operations and the inherent sparsity for vertex features of graphs cause inefficiencies on the GPU. In this paper, we present gPIM, which aims to accelerate GCNs inference through a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">processing-in-memory</i> (PIM) enabled architecture. gPIM is expected to perform compute-intensive combination on the GPU while aggregation and memory-bound combination are offloaded to the PIM-featured <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">hybrid memory cubes</i> (HMCs). To maximize the efficiency of such GPU-HMC architecture, gPIM is novel with two key designs: 1) A GCN-induced graph partitioning that minimizes communication overheads between cubes, 2) A programmer-transparent performance estimation mechanism that predicts the performance bound of operations accurately for workload offloading. Experimental results show that gPIM significantly outperforms Intel Xeon E5-2680v3 CPU (8,979.52×), NVIDIA Tesla V100 GPU (96.01×), and a state-of-the-art GCN accelerator AWB-GCN (4.18×).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call