Fast and High-Accuracy Approximate MAC Unit Design for CNN Computing

Hang Xiao,Yinhe Han,Xiaoming Chen,Yujie Wang,Haobo Xu

doi:10.1109/les.2021.3137335

Abstract

Multiply and accumulate (MAC) composed of a set of multipliers and one reduction dominates the latency and power of convolutional neural network (CNN) accelerators. Existing approximate multipliers reduce latency and power at a tolerable drop in accuracy, without considering the data distribution (implicitly assuming that data are uniformly distributed). This letter discloses that practical CNNs’ activations and weights are usually Gaussian-like distributed, and the bits of quantized activations and weights are typically not with a probability of 0.5. Thus, we propose an approximate MAC unit design by taking into account the statistical features of input data, to achieve a balanced tradeoff among latency, power, and accuracy. The extensive experiments show that our proposed MAC unit design provides much higher accuracy than state-of-the-art approximate circuits, while the latency, area, and power are similar.

Full Text