TIMAQ: A Time-Domain Computing-in-Memory-Based Processor Using Predictable Decomposed Convolution for Arbitrary Quantized DNNs

Jianxun Yang,Shouyi Yin,Jing Zhou,Yuyao Kong,Yonggang Liu,Shaojun Wei,Yiqi Wang,Zhuangzhi Liu,Jun Yang,Chenfu Guo,Jin Zhang,Leibo Liu,Zhao Zhang,Te Hu,Congcong Li

doi:10.1109/jssc.2021.3095232

Abstract

Energy-efficient processors are crucial for accelerating deep neural networks (DNNs) on edge devices with limited battery capacity. To reduce energy consumption, time-domain computing-in-memory (TD-CIM) is a splendid architecture, which consumes low computation and memory access energy due to low toggle rate of time-based signals and less data movements, respectively. When deploying DNNs in TD-CIMs, quantization is required, which has two types: uniform quantization (UQ) and nonuniform quantization (NUQ). To reach the same accuracy for one DNN, NUQ achieves smaller model size than UQ. Due to varying weight distributions across layers, mixed-precision quantization can further reduce model size, without degrading accuracy. However, previous TD-CIMs are inefficient for mixed-precision NUQ-DNNs due to their adopted bit-serial convolution increasing computation amount significantly. To address that, we propose a unique-weight convolution to accelerate mixed-precision NUQ-DNNs by a special kernel decomposition, reducing computation count remarkably. Based on that, we design a TD-CIM-based processor, TIMAQ, with three architectural techniques: 1) bit-cross-flipping-based kernel decomposer to reduce memory accesses and operations of decomposing kernels; 2) dual-mode-complementary predictor to remove redundant computations; and 3) activation-weight-adaptive pulse quantizer to decrease pulse quantization energy and error. Fabricated in 28-nm CMOS technology and tested on 1–8-b NUQ-DNNs, TIMAQ achieves 2.4–152.7-TOPS/W peak energy efficiency.

Full Text