Abstract

Multiply and accumulate (MAC) composed of a set of multipliers and one reduction dominates the latency and power of convolutional neural network (CNN) accelerators. Existing approximate multipliers reduce latency and power at a tolerable drop in accuracy, without considering the data distribution (implicitly assuming that data are uniformly distributed). This letter discloses that practical CNNs’ activations and weights are usually Gaussian-like distributed, and the bits of quantized activations and weights are typically not with a probability of 0.5. Thus, we propose an approximate MAC unit design by taking into account the statistical features of input data, to achieve a balanced tradeoff among latency, power, and accuracy. The extensive experiments show that our proposed MAC unit design provides much higher accuracy than state-of-the-art approximate circuits, while the latency, area, and power are similar.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.