CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling

Kai Zhong,Zhenhua Zhu,Huazhong Yang,Yu Wang,Guohao Dai,Xuecang Zhang,Wentao Hou,Shulin Zeng,Shihai Xiao

doi:10.1109/tcad.2023.3279302

Abstract

As a new algorithm of graph embedding, Graph Neural Networks (GNNs) have been widely used in many fields. However, GNN computing has the characteristics of both sparse graph processing and dense neural network, which make it difficult to be deployed efficiently on the existing graph processing accelerators or neural network accelerators. Recently, some GNN accelerators have been proposed, but the following challenges have not been fully solved: 1) The mini-batch GNN inference scenario has the potential of software and hardware co-design, which can bring 30% computation amount reduction, and this is not well utilized. Besides, the cost of Message Flow Graph construction is large and may account for more than 50% of the total delay. 2) The feature aggregation has a large amount of data access and relatively small amount of computation, which leads to low on-chip data reuse, only 10% of dense computing. 3) Without the optimization of sparse computing units, simple memory bank and cross bar architecture can easily lead to bank access conflict and load imbalance, reducing the utilization of computing units to less than 60%. In order to solve the above problems, we propose a algorithm-hardware co-design scheme to accelerate GNN inference, which includes three technologies: 1) A reuse-aware sampling method is proposed for mini-batch inference scenarios, which reduces 30% of the calculation and improves the on-chip reusability of local data. 2) Through the node-wise parallelism-aware quantization, the features and weights are quantized to integers with eight or four bits, which reduces the amount of memory access by at least four times. 3) An accelerator supporting the above technologies is designed and evaluated, and different operations are supported by the sampling-inference integration architecture. The multi-bank on-chip memory pool is designed to support data reuse, and edge stream reordering is used to reduce data access conflicts, improving the utilization of computing units by 1.5x. Combined with the above technologies, the experiments show that our design achieves 9.2× speedup and 29× energy efficiency improvement compared with Deep Graph Library framework running on servers equiped with CPU and GPU.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Dec 1, 2023
Citations: 2

Similar Papers

Design of Graph Neural Network Accelerator Based on Heterogeneous Architecture
Wenting Pang ... Jin Wu
-
Wenting Pang, et. al.Wenting Pang ... Jin Wu
01 Mar 2022
01 Mar 2022

The Vapnik–Chervonenkis dimension of graph and recursive neural networks
Markus Hagenbuchner ... Franco Scarselli
Neural Networks | VOL. 108
Markus Hagenbuchner, et. al.Markus Hagenbuchner ... Franco Scarselli
01 Sep 2018
Neural Networks | VOL. 108

UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks
Jing Huang ... Jie Yang
-
Jing Huang, et. al.Jing Huang ... Jie Yang
01 Aug 2021
01 Aug 2021

2PGraph: Accelerating GNN Training over Large Graphs on GPU Clusters
Shengwei Li ... Dongsheng Li
-
Shengwei Li, et. al.Shengwei Li ... Dongsheng Li
01 Sep 2021
01 Sep 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CoGNN: An Algorithm-Hardware Co-Design Approach to Accelerate GNN Inference With Minibatch Sampling

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems