Abstract
Deep Learning techniques have been successfully applied to solve many Artificial Intelligence (AI) applications problems. However, owing to topologies with many hidden layers, Deep Neural Networks (DNNs) have high computational complexity, which makes their deployment difficult in contexts highly constrained by requirements such as performance, real-time processing, or energy efficiency. Numerous hardware/software optimization techniques using GPUs, ASICs, and reconfigurable computing (i.e, FPGAs), have been proposed in the literature. With FPGAs, very specialized architectures have been developed to provide an optimal balance between high-speed and low power. However, when targeting edge computing, user requirements and hardware constraints must be efficiently met. Therefore, in this work, we only focus on reconfigurable embedded systems based on the Xilinx ZYNQ SoC and popular DNNs that can be implemented on Embedded Edge improving performance per watt while maintaining accuracy. In this context, we propose an automated framework for the implementation of hardware-accelerated DNN architectures. This framework provides an end-to-end solution that facilitates the efficient deployment of topologies on FPGAs by combining custom hardware scalability with optimization strategies. Cutting-edge comparisons and experimental results demonstrate that the architectures developed by our framework offer the best compromise between performance, energy consumption, and system costs. For instance, the low power (0.266W) DNN topologies generated for the MNIST database achieved a high throughput of 3,626 FPS.
Highlights
I N the last half a century, many researches focus on building computational models allowed to exhibit what we call intelligence [1]–[5]
For the reasons explained above, we propose an automated development framework allowing: an efficient deployment of Deep Neural Networks (DNNs) topologies on embedded Field Programmable Gate Arrays (FPGAs) dedicated to Edge Computing; manage design complexity and tradeoffs transparently; combine custom hardware scalability with flexible optimization strategies; to meet user needs while respecting embedded system limitations; and to facilitate specification entry from Python that mimics the TensorFlow customization’s way
The results show that Caffeine can achieve a peak performance of 365 GOPS on the Xilinx KU060 FPGA and 636 GOPS on the Virtex7 690t FPGA, delivering 7.3× and 43.5× performance and power savings compared to Caffe on a 12-core Xeon server and 1.5× improved energy efficiency compared to a Graphics Processing Units (GPUs)
Summary
I N the last half a century, many researches focus on building computational models allowed to exhibit what we call intelligence [1]–[5]. Of DNN topologies on embedded FPGAs dedicated to Edge Computing; manage design complexity and tradeoffs transparently; combine custom hardware scalability with flexible optimization strategies; to meet user needs while respecting embedded system limitations; and to facilitate specification entry from Python that mimics the TensorFlow customization’s way. Flexible interfacing alternatives combining stream and memory (off/on chip) to deal with latency, further improving throughput and asynchronous data exchange between layers These techniques and their impact on the overall performance and architectural resources will be presented . We propose an automated end-to-end design framework, with parameters (i.e, the balance between pipeline/Parallel optimizations and interface flexibility) allowing the user to get the best tradeoff for DNN deployment on the Edge (performance, power consumption, and size).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.