MAPIM: Mat Parallelism for High Performance Processing in Non-volatile Memory Architecture

Joonseop Sim,Yeseong Kim,Behnam Khaleghi,Saransh Gupta,Minsu Kim,Tajana Rosing

doi:10.1109/isqed.2019.8697441

Abstract

In the Internet of Things (IoT) era, data movement between processing units and memory is a critical factor in the overall system performance. Processing-in-Memory (PIM) is a promising solution to address this bandwidth bottleneck by performing a portion of computation inside the memory. Many prior studies have enabled various PIM operations on nonvolatile memory (NVM) by modifying sense amplifiers (SA). They exploit a single sense amplifier to handle multiple bitlines with a multiplexer (MUX) since a single SA circuit takes much larger area than an NVM 1-bit cell. This limits potential parallelism that the PIM techniques can ideally achieve. In this paper, we propose MAPIM, mat parallelism for high-performance processing in non-volatile memory architecture. Our design carries out multiple bit-lines (BLs) requests under a MUX in parallel with two novel design components, multi-column/row latch (MCRL) and shared SA routing (SSR). The MCRL allows the address decoder to activate multiple addresses in both column and row directions by buffering the consecutively-requested addresses. The activated bits are simultaneously sensed by the multiple SAs across a MUX based on the SSR technique. The experimental results show that MAPIM is up to $\pmb{339}\times$ faster and $\pmb{ 221}\times$ more energy efficient than a GPGPU. As compared to the state-of-the-art PIM designs, our design is $\pmb{16}\times$ faster and $\pmb{ 1.8}\times$ more energy efficient with insignificant area overhead.

Full Text