Abstract

In this work, we propose AirNN, a novel framework which enables dynamic approximation of an already-trained convolutional neural network (CNN) in hardware during inference. AirNN enables input-dependent approximation of the CNN to achieve energy saving without much degradation in its classification accuracy at runtime. For each input, AirNN uses only a fraction of the CNN’s weights based on that input (with the rest remaining 0) to conduct the inference. Consequently, energy saving is possible due to fewer number of fetches from off-chip memory as well as fewer multiplications for majority of the inputs. To achieve per-input approximation, we propose a clustering algorithm that groups similar weights in the CNN based on their importance, and design an iterative framework that decides dynamically how many clusters of weights should be fetched from off-chip memory for each individual input. We also propose new hardware structures to implement our framework on top of a recently proposed FPGA-based CNN accelerator. In our experiments with popular CNNs, we, on average, show 49% energy saving with less than 3% degradation in classification accuracy due to doing inference with only a fraction of the weights for the majority of the inputs. We also propose a greedy interleaving scheme, implemented in hardware, in order to improve the performance of the iterative procedure and compensate for its latency overhead.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.