Deep learning and nonconvex optimization problems are well known data-intensive applications. Although graphic processing units (GPUs) have become the mainstream platform to accelerate the algorithms in the cloud, there is a growing interest to develop application-specific integrated-circuit (ASIC) chips for further improving the energy-efficiency for these data-intensive workloads. Digital multiply-and-accumulate (MAC) arrays are generally employed as ASIC solutions, and data flow is often optimized to increase the data reuse on-chip. Nevertheless, most of the inputs and outputs are moved across MAC arrays and from global buffers. Therefore, it is more attractive to embed the MAC computations into the memory array itself, namely compute-in-memory (CIM), to minimize the data transfer. In CIM, the vector–matrix multiplication is executed in parallel (with analog computation) where the input vectors activate multiple rows. The dot-product is obtained as the multiplication of input voltage and cell conductance, and the partial sum is added up by the column current. An analog-to-digital converter (ADC) at the edge of the array generally converts the partial sum to binary bits for further digital processing.
Read full abstract