Abstract

The reduction in energy consumption is key for deep neural networks (DNNs) to ensure usability and reliability, whether they are deployed on low-power end-nodes with limited resources or high-performance platforms that serve large pools of users. Leveraging the over-parametrization shown by many DNN models, convolutional neural networks (ConvNets) in particular, energy efficiency can be improved substantially preserving the model accuracy. The solution proposed in this work exploits the intrinsic redundancy of ConvNets to maximize the reuse of partial arithmetic results during the inference stages. Specifically, the weight-set of a given ConvNet is discretized through a clustering procedure such that the largest possible number of inner multiplications fall into predefined bins; this allows an off-line computation of the most frequent results, which in turn can be stored locally and retrieved when needed during the forward pass. Such a reuse mechanism leads to remarkable energy savings with the aid of a custom processing element (PE) that integrates an associative memory with a standard floating-point unit (FPU). Moreover, the adoption of an approximate associative rule based on a partial bit-match increases the hit rate over the pre-computed results, maximizing the energy reduction even further. Results collected on a set of ConvNets trained for computer vision and speech processing tasks reveal that the proposed associative-based hw-sw co-design achieves up to 77% in energy savings with less than 1% in accuracy loss.

Highlights

  • In the last decade, convolutional neural networks (ConvNets) have outclassed traditional machine learning algorithms in several tasks, from image classification [1,2] to audio [3,4] and natural language processing [5,6]

  • The experiments conducted on computer vision tasks and keyword spotting reveal that our approach achieves up to 77% of energy savings with a negligible accuracy loss (

  • Razlighi et al in [18] proposed a look-up search into a special content-addressable memory (CAM) mapped onto a resistive technology as a substitute for multiply-and-accumulate (MAC) units. This approach targeted simple multilayer perceptrons (MLPs), which account for fully connected layers only, while it is known that convolutions layers dominate the energy consumption in ConvNets [41,42]

Read more

Summary

Introduction

Convolutional neural networks (ConvNets) have outclassed traditional machine learning algorithms in several tasks, from image classification [1,2] to audio [3,4] and natural language processing [5,6]. Approximations can be applied at different levels by means of different knobs: (i) the data format, with mini-floats [9,10] or fixed-point quantization [11,12,13]; (ii) the arithmetic precision, replacing exact multiplications with an approximate version [14,15]; (iii) the algorithmic structure, for instance simplifying standard convolutions with an alternative formulation, such as Winograd [16] or frequency domain convolution [3]. The convolutional layers are characterized by stencil loops that update array elements according to fixed patterns, thereby producing repetitive workloads with a high degree of temporal and spatial locality This offers the opportunity to implement reuse mechanisms that alleviate the computational workload. A final softmax layer calculates the output probability score across the available classes

ConvNets Approximation via Arithmetic Approximation and Data-Reuse
Co-Design Pipeline
Hardware Design
Results
Software Design
Clustering Engine
APMA Engine
Understanding Co-Design Knobs
Simulation Engine
Hardware and Software Setup
Weight Approximation Pipeline
Input Activations Profiling
Approximate Pattern Matching on Input Activation
Energy-Accuracy Trade-Off and Comparison with Previous Works
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.