HERMES-Core—A 1.59-TOPS/mm2 PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs

Riduan Khaddam-Aljameh,Geethan Karunaratne,Urs Egger,Jordi Fornt Mas,Abu Sebastian,Evangelos Eleftheriou,Feng Liu,Nicole Saulnier,Pier Andrea Francese,S R Nandakumar,Manuel Le Gallo,Ishtiaq Ahsan,Kevin Brew,Abhairaj Singh,Theodore Antonakopoulos,Fee Li Lie,V Narayanan ,Matthias Brändli ,Αναστάσιος Πετρόπουλος ,Miloš Stanisavljević ,S Choi ,V Chan ,I Ok ,Silvia M Müller

doi:10.1109/jssc.2022.3140414

Abstract

We present a 256 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times$ </tex-math></inline-formula> 256 in-memory compute (IMC) core designed and fabricated in 14-nm CMOS technology with backend-integrated multi-level phase change memory (PCM). It comprises 256 linearized current-controlled oscillator (CCO)-based A/D converters (ADCs) at a compact 4- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\mu \text{m}$ </tex-math></inline-formula> pitch and a local digital processing unit (LDPU) performing affine scaling and ReLU operations. A frequency-linearization technique for CCO is introduced, which increases the maximum CCO frequency beyond 3 GHz, while ensuring accurate on-chip matrix–vector multiplications (MVMs). Moreover, the design and functionality of the digital ADC calibration procedure is described in detail and the MVM accuracy is quantified. Finally, the measured classification accuracies of deep learning (DL) inference applications on the MNIST and CIFAR-10 datasets, when two IMC cores are employed, are presented. For a performance density of 1.59 TOPS/mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> , a measured energy efficiency of 10.5 TOPS/W, at a main clock frequency of 1 GHz, is achieved.

Highlights

I N-MEMORY computing (IMC) is an emerging non-von Neumann paradigm where computation is performed in the memory array itself [1], [2]
In order to accelerate the execution of matrix–vector multiplications (MVMs) operations using IMC, the memory system must be repurposed into a single instruction multiple data (SIMD) array of processing elements [4], where the input vector data is broadcasted across the matrix rows and the various partial products are summed up along a column
We propose a hardware-centric approach to characterize the quality of the analog MVM operation

Summary

INTRODUCTION

I N-MEMORY computing (IMC) is an emerging non-von Neumann paradigm where computation is performed in the memory array itself [1], [2]. Voltage-based A/D converters (ADCs) are mostly used [31] that require a voltage to current conversion, usually employing a large capacitor for integration [23], [32] This has far hampered the realization of large fully-parallel on-chip MVM operations at true O(1) complexity. It comprises compact, low-latency, and energy-efficient current-controlled oscillator (CCO)-based ADCs, digital readout blocks, and a local digital processing unit (LDPU) performing affine scaling and ReLU operations. It presents a comparison of the proposed PCMbased core with other state-of-the-art IMC designs.

UNIT CELL AND ARRAY DESIGN

LINEARIZED CCO-BASED ADC

CCO-Based ADC Structure

Linearization Technique

Counter With Variable Increment Size

ADC Calibration Procedure

MVM OPERATION

Multi-Bit and Bit-Serial Input Modulation

APPLICATIONS AND RESULTS

Hardware-Aware Training

Hardware Experiment

System-Level Performance

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Journal of Solid-State Circuits	Publication Date: Apr 1, 2022
Citations: 48	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

HERMES-Core—A 1.59-TOPS/mm2 PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Journal of Solid-State Circuits

Lead the way for us

Similar Papers

HERMES Core – A 14nm CMOS and PCM-based In-Memory Compute Core using an array of 300ps/LSB Linearized CCO-based ADCs and local digital processing
...
-
, et. al. ...
13 Jun 2021
HERMES Core – A 14nm CMOS and PCM-based In-Memory Compute Core using an array of 300ps/LSB Linearized CCO-based ADCs and local digital processing
...

Characterizing the Performance of Deep Learning Inference for Edge Video Analytics
Di Wu ... Zhenxiao Luo
-
Di Wu, et. al.Di Wu ... Zhenxiao Luo
01 Jan 2021
01 Jan 2021

Impact of Phase-Change Memory Drift on Energy Efficiency and Accuracy of Analog Compute-in-Memory Deep Learning Inference (Invited)
Malte J Rasch ...
-
Malte J Rasch, et. al.Malte J Rasch ...
01 Mar 2023
01 Mar 2023

Privacy-preserving and verifiable deep learning inference based on secret sharing
Yuanman Li ... Jia Duan
Neurocomputing | VOL. 483
Yuanman Li, et. al.Yuanman Li ... Jia Duan
21 Jan 2022
Neurocomputing | VOL. 483

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HERMES-Core—A 1.59-TOPS/mm2 PCM on 14-nm CMOS In-Memory Compute Core Using 300-ps/LSB Linearized CCO-Based ADCs

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Journal of Solid-State Circuits