SAMBA: Sparsity Aware In-Memory Computing Based Machine Learning Accelerator

Dong Eun Kim,Aayush Ankit,Kaushik Roy,Cheng Wang

doi:10.1109/tc.2023.3257513

Abstract

Machine Learning (ML) inference is typically dominated by highly data-intensive Matrix Vector Multiplication (MVM) computations that may be constrained by the memory bottleneck due to massive data movement between processor and memory units. Although analog in-memory computing (IMC) ML accelerators have been proposed to execute MVM with high efficiency, the latency and energy of such analog computing systems can be dominated by the large latency and energy costs from analog-to-digital converters (ADC). Leveraging sparsity in ML workloads, reconfigurable ADCs can improve the MVM energy and latency by reducing the required ADC bit precision. However, such improvement in latency can be hindered by non-uniform sparsity of the weight matrices mapped into hardware. Moreover, data movement between MVM processing cores may become another factor that delays the overall system-level performance. To address these issues, we propose SAMBA, Sparsity Aware IMC Based Machine Learning Accelerator. First, we propose load balancing at the mapping of weight matrices into physical crossbars. The goal of load balancing is to accommodate reconfigurable ADCs and address the possible delay of MVM processing caused by non-uniform sparsity in the weight matrices. Second, we propose optimizations in arranging and scheduling the tiled MVM hardware to minimize the overhead of data movement across multiple processing cores. Our evaluations show that the proposed load balancing technique can achieve performance improvement by eliminating the non-uniformity in the sparsity of mapped matrices. Moreover, we demonstrate that data movement optimizations can further improve both performance and energy efficiency regardless of sparsity condition. With the combination of load balancing and data movement optimization in conjunction with reconfigurable ADCs, our proposed approach provides up to 2.38x speed-up and 1.54x energy efficiency over state-of-art analog IMC based ML accelerators for ImageNet datasets on Resnet-50 architecture.

Full Text