Abstract

In this paper, we present a 10T SRAM compute-in memory (CiM) macro to process the multiplication-accumulation (MAC) operations between ternary-inputs and binary-weights. In the proposed 10T SRAM bitcell, the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">charge-domain</i> analog computations are employed to improve the noise tolerance of bit-line (BL) signals where the MAC results are represented in CiM. Parallel processing of 3 different analog levels for ternary input activations is also performed in the proposed single 10T bitcell. To reduce the analog-to-digital converter (ADC) bit-resolutions without sacrificing deep neural network (DNN) accuracies, a confined-slope non-uniform integration (CS-NUI) ADC is proposed, which can provide layer-wise adaptive quantization for multiple different layers with different MAC distributions. In addition, by sharing the ADC reference voltage generator in every single column of SRAM array, the ADC area is effectively reduced with improved energy efficiencies of CiM. The <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$256\times64.10$</tex-math> </inline-formula> T SRAM CiM macro with the proposed charge-sharing scheme and CS-NUI ADCs has been implemented using 28nm CMOS process. The silicon measurement results show that the proposed CiM shows the accuracies of 98.66% and 88.48% with MNIST dataset on MLP, and CIFAR-10 dataset on VGGNet-7, respectively, with the energy efficiency of 2941-TOPS/W and the area efficiency of 59.584-TOPS/mm <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$^{2}$</tex-math> </inline-formula> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call