A Novel Automate Python Edge-to-Edge: From Automated Generation on Cloud to User Application Deployment on Edge of Deep Neural Networks for Low Power IoT Systems FPGA-Based Acceleration.

Alexandre Quenon,Tarek Belabed,Carlos Valderamma,Chokri Souani,Vitor Ramos Gomes Da Silva

doi:10.3390/s21186050

Alexandre Quenon, Tarek Belabed + Show 3 more

Open Access

https://doi.org/10.3390/s21186050

Copy DOI

Abstract

Deep Neural Networks (DNNs) deployment for IoT Edge applications requires strong skills in hardware and software. In this paper, a novel design framework fully automated for Edge applications is proposed to perform such a deployment on System-on-Chips. Based on a high-level Python interface that mimics the leading Deep Learning software frameworks, it offers an easy way to implement a hardware-accelerated DNN on an FPGA. To do this, our design methodology covers the three main phases: (a) customization: where the user specifies the optimizations needed on each DNN layer, (b) generation: the framework generates on the Cloud the necessary binaries for both FPGA and software parts, and (c) deployment: the SoC on the Edge receives the resulting files serving to program the FPGA and related Python libraries for user applications. Among the study cases, an optimized DNN for the MNIST database can speed up more than 60× a software version on the ZYNQ 7020 SoC and still consume less than . A comparison with the state-of-the-art frameworks demonstrates that our methodology offers the best trade-off between throughput, power consumption, and system cost.

Highlights

Over the last few decades, both Artificial Intelligence (AI) and the Internet of Things (IoT) have seen considerable development and adoption in numerous domains [1,2,3,4]. they were not originally meant to be merged, some specific applications require the accuracy and performance offered by AI algorithms, by Deep NeuralNetworks (DNN), while being constrained by typical IoT considerations, such as the low power consumption [5]
The main technical difficulties originate (1) from the high computing demand of the Deep Neural Networks (DNNs)-related algorithms, whereas the edge and IoT nodes generally offer a limited computational power, and (2) from the usually high power consumption requirement, not compatible with the target deployment platform. To solve these problems, dedicated embedded systems have been proposed: using reconfigurable circuits, the Field Programmable Gate Arrays (FPGA), and Systemon-Chips (SoC), a complete system embedded on a single chip, which targets the deployment of DNN for edge computing and the Internet of Things
This challenge has been partly mitigated by the appearance of High-Level Synthesis (HLS) tools that help to divide the tasks between the CPU and the FPGA in an optimized way, performing the so-called hardware acceleration

Summary

Introduction

Over the last few decades, both Artificial Intelligence (AI) and the Internet of Things (IoT) have seen considerable development and adoption in numerous domains [1,2,3,4]. they were not originally meant to be merged, some specific applications require the accuracy and performance offered by AI algorithms, by Deep NeuralNetworks (DNN), while being constrained by typical IoT considerations, such as the low power consumption [5]. Over the last few decades, both Artificial Intelligence (AI) and the Internet of Things (IoT) have seen considerable development and adoption in numerous domains [1,2,3,4] They were not originally meant to be merged, some specific applications require the accuracy and performance offered by AI algorithms, by Deep Neural. The main technical difficulties originate (1) from the high computing demand of the DNN-related algorithms, whereas the edge and IoT nodes generally offer a limited computational power, and (2) from the usually high power consumption requirement, not compatible with the target deployment platform. This challenge has been partly mitigated by the appearance of High-Level Synthesis (HLS) tools that help to divide the tasks between the CPU and the FPGA in an optimized way, performing the so-called hardware acceleration

Objectives

Methods

Results

Conclusion