Abstract

In recent years, with the rapid development of DNN, the algorithm complexity in a series of fields such as computer vision and natural language processing is increasing rapidly. FPGA-based DNN accelerators have demonstrated superior flexibility and performance, with higher energy efficiency compared to high-performance devices such as GPU. However, the computing resources of a single FPGA are limited and it is difficult to flexibly meet the requirements of high throughput and high energy efficiency of different computing scales. Therefore, this paper proposes a DNN implementation method based on the scalable heterogeneous FPGA cluster to adapt to different tasks and achieve high throughput and energy efficiency. Firstly, the method divides a single enormous task into multiple modules and running each module on different FPGA as the pipeline structure between multiple boards. Secondly, a task deployment method based on dichotomy is proposed to maximize the balance of task execution time of different pipeline stages to improve throughput and energy efficiency. Thirdly, optimize DNN computing module according to the relationship between computing power and bandwidth, and improve energy efficiency by reducing waste of ineffective resources and improving resource utilization. The experiment results on Alexnet and VGG-16 demonstrate that we use Zynq 7035 cluster can at most achieves ×25.23 energy efficiency of optimized AMD AIO processor. Compared with previous works of single FPGA and FPGA cluster, the energy efficiency is improved by 59.5% and 18.8%, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call