Abstract

Multiply and accumulate (MAC) composed of a set of multipliers and one reduction dominates the latency and power of convolutional neural network (CNN) accelerators. Existing approximate multipliers reduce latency and power at a tolerable drop in accuracy, without considering the data distribution (implicitly assuming that data are uniformly distributed). This letter discloses that practical CNNs’ activations and weights are usually Gaussian-like distributed, and the bits of quantized activations and weights are typically not with a probability of 0.5. Thus, we propose an approximate MAC unit design by taking into account the statistical features of input data, to achieve a balanced tradeoff among latency, power, and accuracy. The extensive experiments show that our proposed MAC unit design provides much higher accuracy than state-of-the-art approximate circuits, while the latency, area, and power are similar.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call