Inner Product Computation In-Memory Using Distributed Arithmetic

Vijaya Lakshmi,John Reuben,Vikramkumar Pudi

doi:10.1109/tcsi.2022.3193678

Abstract

In-memory computing using emerging technologies such as Resistive Random-Access Memory (ReRAM) has been proposed as a promising substitute for future computing applications to address the ‘von Neumann bottleneck’. Multiplication is the key component for inner product computation in every digital signal processing (DSP) application and the complexity of multipliers increases greatly with bit-width. Distributed arithmetic (DA) using look-up tables and adder-shifter module has been proposed for inner product computation to achieve multiplier-less efficient DSP architectures, particularly when one of the vectors is a constant and known in advance. Due to the memory wall, DA can be made furthermore latency and energy-efficient when implemented ‘in memory’. In this work, for the first time, we propose two design techniques to compute inner product completely in memory using DA. This is accomplished by storing the precomputed look-up table contents in a ReRAM array and implementing adder-shifter module also in the same array. The adder-shifter is implemented in memory using majority gates which are in turn realized as READ operations in the memory array. Two methods of mapping: latency-optimized and area-optimized and their comparison in terms of latency and area are presented. The proposed method-1 achieves <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\approx 60$ </tex-math></inline-formula> % energy savings compared to CMOS and the proposed method-2 achieves 10.59 times higher throughput compared to CMOS.

Full Text