Accelerating Graph Computations on 3D NoC-Enabled PIM Architectures

Dwaipayan Choudhury,Partha Pratim Pande,Aravind Rajam,Lizhi Xiang,Anantharaman Kalyanaraman

doi:10.1145/3564290

Abstract

Graph application workloads are dominated by random memory accesses with the poor locality. To tackle the irregular and sparse nature of computation, ReRAM-based Processing-in-Memory (PIM) architectures have been proposed recently. Most of these ReRAM architecture designs have focused on mapping graph computations into a set of multiply-and-accumulate (MAC) operations. ReRAMs also offer a key advantage in reducing memory latency between cores and memory by allowing for PIM. However, when implemented on a ReRAM-based manycore architecture, graph applications still pose two key challenges—significant storage requirements (particularly due to wasted zero cell storage), and significant amount of on-chip traffic. To tackle these two challenges, in this article, we propose the design of a 3D NoC-enabled ReRAM-based manycore architecture. Our proposed architecture incorporates a novel crossbar-aware node reordering to reduce ReRAM storage requirements. Secondly, its 3D NoC-enabled design reduces on-chip communication latency. Our architecture outperforms the state-of-the-art in ReRAM-based graph acceleration by up to 5× in performance while consuming up to 10.3× less energy for a range of graph inputs and workloads.

Full Text