Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the Edge

Guilherme Korol,Michael Guilherme Jordan,Antonio Carlos Schneider Beck,Mateus Beck Rutzig

doi:10.1145/3476990

Abstract

FPGAs, because of their energy efficiency, reconfigurability, and easily tunable HLS designs, have been used to accelerate an increasing number of machine learning, especially CNN-based, applications. As a representative example, IoT Edge applications, which require low latency processing of resource-hungry CNNs, offload the inferences from resource-limited IoT end nodes to Edge servers featuring FPGAs. However, the ever-increasing number of end nodes pressures these FPGA-based servers with new performance and adaptability challenges. While some works have exploited CNN optimizations to alleviate inferences’ computation and memory burdens, others have exploited HLS to tune accelerators for statically defined optimization goals. However, these works have not tackled both CNN and HLS optimizations altogether; neither have they provided any adaptability at runtime, where the workload’s characteristics are unpredictable. In this context, we propose a hybrid two-step approach that, first, creates new optimization opportunities at design-time through the automatic training of CNN model variants (obtained via pruning) and the automatic generation of versions of convolutional accelerators (obtained during HLS synthesis); and, second, synergistically exploits these created CNN and HLS optimization opportunities to deliver a fully dynamic Multi-FPGA system that adapts its resources in a fully automatic or user-configurable manner. We implement this two-step approach as the AdaServ Framework and show, through a smart video surveillance Edge application as a case study, that it adapts to the always-changing Edge conditions: AdaServ processes at least 3.37× more inferences (using the automatic approach) and is at least 6.68× more energy-efficient (user-configurable approach) than original convolutional accelerators and CNN Models (VGG-16 and AlexNet). We also show that AdaServ achieves better results than solutions dynamically changing only the CNN model or HLS version, highlighting the importance of exploring both; and that it is always better than the best statically chosen CNN model and HLS version, showing the need for dynamic adaptability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the Edge

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Embedded Computing Systems

Lead the way for us

Journal: ACM Transactions on Embedded Computing Systems	Publication Date: Sep 17, 2021
Citations: 9

Similar Papers

Olive Disease Classification Based on Vision Transformer and CNN Models.
Hamoud Alshammari ... Mahmood A Mahmood
Computational intelligence and neuroscience | VOL. 2022
Hamoud Alshammari, et. al.Hamoud Alshammari ... Mahmood A Mahmood
31 Jul 2022
Computational intelligence and neuroscience | VOL. 2022

A hybrid model for depression detection using deep learning
Vandana ... Deepti Chaudhary
Measurement: Sensors | VOL. 25
Vandana, et. al. Vandana ... Deepti Chaudhary
30 Dec 2022
Measurement: Sensors | VOL. 25

LUT‐DSP usage trade‐off for re‐configurable convolution acceleration core based on small logarithmic floating point representation
Botao Xiong ... Sheng Fan
International Journal of Circuit Theory and Applications | VOL. 52
Botao Xiong, et. al.Botao Xiong ... Sheng Fan
24 Oct 2023
International Journal of Circuit Theory and Applications | VOL. 52

An Energy-and-Area-Efficient CNN Accelerator for Universal Powers-of-Two Quantization
Tian Xia ... Nanning Zheng
IEEE Transactions on Circuits and Systems I: Regular Papers | VOL. 70
Tian Xia, et. al.Tian Xia ... Nanning Zheng
01 Mar 2023
IEEE Transactions on Circuits and Systems I: Regular Papers | VOL. 70

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Synergistically Exploiting CNN Pruning and HLS Versioning for Adaptive Inference on Multi-FPGAs at the Edge

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Embedded Computing Systems