FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction

Bahar Asgari,Jiashen Cao,Ramyad Hadidi,Hyesoon Kim,Da Eun Shim,Sung-Kyu Lim

doi:10.1109/hpca51647.2021.00080

Abstract

Memory-bound sparse gathering, caused by irregular random memory accesses, has become an obstacle in several on-demand applications such as embedding lookup in recommendation systems. To reduce the amount of data movement, and thereby better utilize memory bandwidth, previous studies have proposed near-data processing (NDP) solutions. The issue of prior work, however, is that they either minimize data movement effectively at the cost of limited memory parallelism or try to improve memory parallelism (up to a certain degree) but cannot successfully decrease data movement, as prior proposals rely on spatial locality (an optimistic assumption) to utilize NDP. More importantly, neither approach proposes a solution for gathering data from random memory addresses; rather they just offload operations to NDP. We propose an effective solution for sparse gathering, an efficient near-memory intelligent reduction (Fafnir) tree, the leaves of which are all the ranks in a memory system, and the nodes gradually apply reduction operations while data is gathered from any rank. By using such an overall tree, Fafnir does not rely on spatial locality; therefore, it minimizes data movement by performing entire operations at NDP and fully benefits from parallel memory accesses in parallel processing at NDP. Further, Fafnir offers other advantages such as using fewer connections (because of the tree topology), eliminating redundant memory accesses without using costly and less effective caching mechanisms, and being applicable to other domains of sparse problems such as scientific computing and graph analytics. To evaluate Fafnir, we implement it on an XCVU9P Xilinx FPGA and in 7 nm ASAP ASIC. Fafnir looks up the embedding tables up to 21.3× more quickly than the state-of-the-art NDP proposal. Furthermore, the generic architecture of Fafnir allows running classic sparse problems using the same 1.2 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> hardware up to 4.6× more quickly than the state of the art.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Role of on-chip networks in building domain-specific architectures (DSAs) for sparse computations (invited)
Abhishek Kumar Jain
-
Abhishek Kumar JainAbhishek Kumar Jain
05 Nov 2020
05 Nov 2020

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks
Geraldo F Oliveira ... Nandita Vijaykumar
IEEE Access | VOL. 9
Geraldo F Oliveira, et. al.Geraldo F Oliveira ... Nandita Vijaykumar
01 Jan 2020
IEEE Access | VOL. 9

Practical Mechanisms for Reducing Processor–Memory Data Movement in Modern Workloads

-

21 May 2021
21 May 2021

ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems
Nika Mansouri Ghiasi ... Onur Mutlu
IEEE Transactions on Emerging Topics in Computing | VOL. 11
Nika Mansouri Ghiasi, et. al.Nika Mansouri Ghiasi ... Onur Mutlu
01 Apr 2023
IEEE Transactions on Emerging Topics in Computing | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction

Abstract

Talk to us

Similar Papers