Abstract
Neural networks generally require significant memory capacity/bandwidth to store/access a large number of synaptic weights. This paper presents design of an energy-efficient neural network inference engine based on adaptive weight compression using a JPEG image encoding algorithm. To maximize compression ratio with minimum accuracy loss, the quality factor of the JPEG encoder is adaptively controlled depending on the accuracy impact of each block. With 1% accuracy loss, the proposed approach achieves $63.4{\times }$ compression for multilayer perceptron (MLP) and $31.3 {\times }$ for LeNet-5 with the MNIST dataset, and $15.3 {\times }$ for AlexNet and $10.2 {\times }$ for ResNet-50 with ImageNet. The reduced memory requirement leads to higher throughput and lower energy for neural network inference ( $3 {\times }$ effective memory bandwidth and $22 {\times }$ lower system energy for MLP).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.