User Driven FPGA-Based Design Automated Framework of Deep Neural Networks for Low-Power Low-Cost Edge Computing

Tarek Belabed,Carlos Valderrama Sakuyama,Chokri Souani,Marcelo A C Fernandes,Maria Gracielly F Coutinho

doi:10.1109/access.2021.3090196

Abstract

Deep Learning techniques have been successfully applied to solve many Artificial Intelligence (AI) applications problems. However, owing to topologies with many hidden layers, Deep Neural Networks (DNNs) have high computational complexity, which makes their deployment difficult in contexts highly constrained by requirements such as performance, real-time processing, or energy efficiency. Numerous hardware/software optimization techniques using GPUs, ASICs, and reconfigurable computing (i.e, FPGAs), have been proposed in the literature. With FPGAs, very specialized architectures have been developed to provide an optimal balance between high-speed and low power. However, when targeting edge computing, user requirements and hardware constraints must be efficiently met. Therefore, in this work, we only focus on reconfigurable embedded systems based on the Xilinx ZYNQ SoC and popular DNNs that can be implemented on Embedded Edge improving performance per watt while maintaining accuracy. In this context, we propose an automated framework for the implementation of hardware-accelerated DNN architectures. This framework provides an end-to-end solution that facilitates the efficient deployment of topologies on FPGAs by combining custom hardware scalability with optimization strategies. Cutting-edge comparisons and experimental results demonstrate that the architectures developed by our framework offer the best compromise between performance, energy consumption, and system costs. For instance, the low power (0.266W) DNN topologies generated for the MNIST database achieved a high throughput of 3,626 FPS.

Highlights

I N the last half a century, many researches focus on building computational models allowed to exhibit what we call intelligence [1]–[5]
For the reasons explained above, we propose an automated development framework allowing: an efficient deployment of Deep Neural Networks (DNNs) topologies on embedded Field Programmable Gate Arrays (FPGAs) dedicated to Edge Computing; manage design complexity and tradeoffs transparently; combine custom hardware scalability with flexible optimization strategies; to meet user needs while respecting embedded system limitations; and to facilitate specification entry from Python that mimics the TensorFlow customization’s way
The results show that Caffeine can achieve a peak performance of 365 GOPS on the Xilinx KU060 FPGA and 636 GOPS on the Virtex7 690t FPGA, delivering 7.3× and 43.5× performance and power savings compared to Caffe on a 12-core Xeon server and 1.5× improved energy efficiency compared to a Graphics Processing Units (GPUs)

Summary

INTRODUCTION

I N the last half a century, many researches focus on building computational models allowed to exhibit what we call intelligence [1]–[5]. Of DNN topologies on embedded FPGAs dedicated to Edge Computing; manage design complexity and tradeoffs transparently; combine custom hardware scalability with flexible optimization strategies; to meet user needs while respecting embedded system limitations; and to facilitate specification entry from Python that mimics the TensorFlow customization’s way. Flexible interfacing alternatives combining stream and memory (off/on chip) to deal with latency, further improving throughput and asynchronous data exchange between layers These techniques and their impact on the overall performance and architectural resources will be presented . We propose an automated end-to-end design framework, with parameters (i.e, the balance between pipeline/Parallel optimizations and interface flexibility) allowing the user to get the best tradeoff for DNN deployment on the Edge (performance, power consumption, and size).

STATE OF ART

THE BASIC IP-LAYERS

PROCESSING ELEMENTS OPTIMIZATION

DATA DISTRIBUTION FOR PARALLEL OPERATIONS

PIPELINE STRATEGY

INTERFACING AND COMMUNICATION PROTOCOLS

32 AXI-MM

SYNTHESIS RESULTS

VIII. EXPERIMENTAL RESULTS

CONCLUSION AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

User Driven FPGA-Based Design Automated Framework of Deep Neural Networks for Low-Power Low-Cost Edge Computing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

QD-Compressor: a Quantization-based Delta Compression Framework for Deep Neural Networks
Shuyu Zhang ... Wen Xia
-
Shuyu Zhang, et. al.Shuyu Zhang ... Wen Xia
01 Oct 2021
01 Oct 2021

Mariana
Yongqiang Zou ... Xing Jin
Proceedings of the VLDB Endowment | VOL. 7
Yongqiang Zou, et. al.Yongqiang Zou ... Xing Jin
01 Aug 2014
Proceedings of the VLDB Endowment | VOL. 7

CriticalFuzz: A critical neuron coverage-guided fuzz testing framework for deep neural networks
Tongtong Bai ... Zhen Yang
Information and Software Technology | VOL. 172
Tongtong Bai, et. al.Tongtong Bai ... Zhen Yang
24 Apr 2024
Information and Software Technology | VOL. 172

Comprehensive Study for Breast Cancer Using Deep Learning and Traditional Machine Learning
-
ZANCO JOURNAL OF PURE AND APPLIED SCIENCES | VOL. 34
--
12 Apr 2022
ZANCO JOURNAL OF PURE AND APPLIED SCIENCES | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

User Driven FPGA-Based Design Automated Framework of Deep Neural Networks for Low-Power Low-Cost Edge Computing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access