Abstract

As Graph Convolutional Networks (GCNs) have emerged as a promising solution for graph representation learning, designing specialized GCN accelerators has become an important challenge. An analysis of GCN workloads shows that the main bottleneck of GCN processing is not computation but the memory latency of intensive off-chip data transfer. Therefore, minimizing off-chip data transfer is the primary challenge for designing an efficient GCN accelerator. To address this challenge, optimization is initialized by considering GCNs as tiled matrix multiplication. In this paper, we optimize off-chip memory access from both the in- and out-of-tile perspectives. From the out-of-tile perspective, we find optimal tile configurations of given datasets and on-chip buffer capacity, then observe the dataflow across phases and layers. Inter-layer phase fusion dataflow with optimal tile configuration reduces data transfer of intermediate outputs. From the in-tile perspective, due to the sparsity of tiles, tiles have redundant data which does not participate in computation. Redundant data load is eliminated with hardware support. Finally, we introduce an efficient GCN inference accelerator, EGCN, specialized for minimizing off-chip memory access. EGCN achieves 41.9% off-chip DRAM access reduction, 1.49× speedup, and 1.95× energy efficiency improvement on average over the state-of-the-art accelerators.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call