Algorithm/Architecture Co-Design for Near-Memory Processing

Dmitrii Ustiugov,Alexandros Daglis,Javier Picorel,Nooshin Mirzadeh,Boris Grot,Mario Drumond,Dionisios Pnevmatikatos,Babak Falsafi

doi:10.1145/3273982.3273992

Abstract

With mainstream technologies to couple logic tightly with memory on the horizon, near-memory processing has re-emerged as a promising approach to improving performance and energy for data-centric computing. DRAM, however, is primarily designed for density and low cost, with a rigid internal organization that favors coarse-grain streaming rather than byte-level random access. This paper makes the case that treating DRAM as a block-oriented streaming device yields significant efficiency and performance benefits, which motivate for algorithm/architecture co-design to favor streaming access patterns, even at the price of a higher order algorithmic complexity. We present the Mondrian Data Engine that drastically improves the runtime and energy efficiency of basic in-memory analytic operators, despite doing more work as compared to traditional CPU-optimized algorithms, which heavily rely on random accesses and deep cache hierarchies

Full Text