Compute-in-memory (CIM) is an attractive solution for machine learning hardware acceleration since it merges computation directly into memory arrays, performing parallel multiply-and-accumulate (MAC) operations. The primary challenge in the reported CIM designs is the analog-to-digital converters (ADCs) that digitize analog MAC values for further processing, causing accuracy loss, excessive power dissipation, latency penalty, and area overhead. In this work, we propose ENNA, a novel CIM architecture based on an ADC-free sub-array design, implementing inter-array data processing in an analog manner. A lightweight input encoding scheme based on pulse-width modulation (PWM) is proposed to improve the throughput. We taped-out a prototype macro and validated the proposed ADC-free RRAM array design in TSMC 40nm process. Based on the measured silicon data, we explore the system-level performance with a partition between analog and digital processing at a level higher than the sub-array. The evaluation results show that the proposed accelerator can achieve 73.6~86.4 TOPS/W energy efficiency and 2.3~7 TOPS throughput (normalized to binary operation) tested on various DNN models. Furthermore, we project the proposed design using a heterogeneous 3D integration (H3D) scheme, showing a <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$3\times \sim 37\times $ </tex-math></inline-formula> throughput improvement depending on different tasks and ~50% reduced area overhead compared to 2D design.
Read full abstract