IMC

Shaahin Angizi,Deliang Fan

doi:10.1145/3183584.3183613

Abstract

Deep Convolutional Neural Networks (CNNs) are widely employed in modern AI systems due to their unprecedented accuracy in object recognition and detection. However, it has been proven that the main bottleneck to improve large scale deep CNN based hardware implementation performance is massive data communication between processing units and off-chip memory. In this paper, we pave a way towards novel concept of in-memory convolver (IMC) that could implement the dominant convolution computation within main memory based on our proposed Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) array architecture to greatly reduce data communication and thus accelerate Binary CNN (BCNN). The proposed architecture could simultaneously work as non-volatile memory and a reconfigurable in-memory logic (AND, OR) without add-on logic circuits to memory chip as in conventional logic-in-memory designs. The computed logic output could be also simply read out like a normal MRAM bit-cell using the shared memory peripheral circuits. We employ such intrinsic in-memory processing architecture to efficiently process data within memory to greatly reduce power-hungry and long distance data communication concerning state-of-the-art BCNN hardware. The hardware mapping results show that IMC can process the Binarized AlexNet on ImageNet data-set favorably with 134.27 μJ/img where ∼ 16x and 9x lower energy and area are achieved, respectively, compared to RRAM-based BCNN. Furthermore, 21.5% reduction in data movement in term of main memory accesses is observed compared to CPU/DRAM baseline.

Full Text