Abstract

Due to the low-power priority of analog delay-based computation, time-domain computing-in-memory (TD-CIM) presents a splendid potential for energy-constrained edge and IoT scenarios deploying convolutional neural networks (CNNs). However, the latency in delay-based computation is proportional to the numbers and values of multiplications-and-accumulations (MACs), bottlenecking the throughput of previous data-agnostic TD-CIM-based processors which compute complete convolutions in a fixed MAC mapping manner. Firstly, some output activations in each layer of CNNs contribute less to the final classification results, which are insignificant and can be substituted by sums of partial MACs, with a marginal accuracy degradation. Thus, complete convolution computations lead to redundant MACs. Secondly, activations and weights vary with input images and models. Fixed MAC mapping leads to unbalanced MAC values on delay chains, causing long idle time and latency. To address that, we design a data-aware TD-CIM-based CNN processor, DATIC, with three techniques to reduce latency: 1) A channel-skipping TD-CIM macro to remove redundant MACs for insignificant output activations, by storing activations stationary in SRAM bitcells and shifting weights to perform only imperative MACs; 2) A convolution order programming unit to reduce overhead of skipping redundant MACs for insignificant output activations with random positions on feature maps; 3) An activation-weight-adaptive channel-mapping scheduler to balance latency of delay chains by dynamically altering convolution mapping manner. Implemented under TSMC 28-nm technology, DATIC achieves 622.9GOPS throughput and 32.7TOPS/W energy efficiency for ResNet-18 with 2-b weights and 8-b activations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call