Weight-Oriented Approximation for Energy-Efficient Neural Network Inference Accelerators

Zois-Gerasimos Tasoulas,Hussam Amrouch,Jorg Henkel,Iraklis Anagnostopoulos,Georgios Zervakis

doi:10.1109/tcsi.2020.3019460

Abstract

Current research in the area of Neural Networks (NN) has resulted in performance advancements for a variety of complex problems. Especially, embedded system applications rely more and more on the utilization of convolutional NNs to provide services such as image/audio classification and object detection. The core arithmetic computation performed during NN inference is the multiply-accumulate (MAC) operation. In order to meet tighter and tighter throughput constraints, NN accelerators integrate thousands of MAC units resulting in a significant increase in power consumption. Approximate computing is established as a design alternative to improve the efficiency of computing systems by trading computational accuracy for high energy savings. In this work, we bring approximate computing principles and NN inference together by designing NN specific approximate multipliers that feature multiple accuracy levels at run-time. We propose a time-efficient automated framework for mapping the NN weights to the accuracy levels of the approximate reconfigurable accelerator. The proposed weight-oriented approximation mapping is able to satisfy tight accuracy loss thresholds, while significantly reducing energy consumption without any need for intensive NN retraining. Our approach is evaluated against several NNs demonstrating that it delivers high energy savings (17.8% on average) with a minimal loss in inference accuracy (0.5%).

Full Text