Abstract

Deep learning and nonconvex optimization problems are well known data-intensive applications. Although graphic processing units (GPUs) have become the mainstream platform to accelerate the algorithms in the cloud, there is a growing interest to develop application-specific integrated-circuit (ASIC) chips for further improving the energy-efficiency for these data-intensive workloads. Digital multiply-and-accumulate (MAC) arrays are generally employed as ASIC solutions, and data flow is often optimized to increase the data reuse on-chip. Nevertheless, most of the inputs and outputs are moved across MAC arrays and from global buffers. Therefore, it is more attractive to embed the MAC computations into the memory array itself, namely compute-in-memory (CIM), to minimize the data transfer. In CIM, the vector–matrix multiplication is executed in parallel (with analog computation) where the input vectors activate multiple rows. The dot-product is obtained as the multiplication of input voltage and cell conductance, and the partial sum is added up by the column current. An analog-to-digital converter (ADC) at the edge of the array generally converts the partial sum to binary bits for further digital processing.

Highlights

  • To implement CIM, mature SRAM technologies have been proposed

  • Graphic processing units (GPUs) have become the mainstream platform to accelerate the algorithms in the cloud, there is a growing interest to develop application-specific integrated-circuit (ASIC) chips for further improving the energy-efficiency for these data-intensive workloads

  • An analog-to-digital converter (ADC) at the edge of the array generally converts the partial sum to binary bits for further digital processing

Read more

Summary

Introduction

To implement CIM, mature SRAM technologies (possibly with modified bit cells) have been proposed. The topics of interests of this special topic included, but were not limited to: 1) materials and devices that can enable CIM; 2) integration of emerging technologies with silicon for CIM; 3) crossbar array design for CIM; 4) array-level demonstration for CIM; 5) peripheral circuit design for CIM; 6) architectural-level design for CIM; 7) algorithms and hardware co-design for CIM; 8) benchmarking simulators for CIM; 9) new applications for CIM beyond deep learning.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call