Abstract
AbstractThanks to the great developments of the latest Field Programmable Gate Array (FPGA), the performance bottleneck of Deep Learning hardware accelerators has been converted to computing ability. In this paper, a novel FPGA‐based Convolutional Neural Network (CNN) Accelerator architecture, named the Effective Pipeline Architecture (EPA) is proposed to optimize the resource usage for the implementation of the CNN calculation. As the unique storage strategies, which contain many creative designing details, are adopted and optimized for different CNN models and layers, great DSP computing efficiency can be achieved in the fine‐grained pipeline. Moreover, compared with the traditional architectures, through the kernel combination and data scheduling, twice throughput for the general matrix multiplication is realized in a great many parallel DSP48E resources. As a result, the realization of Yolov2‐Tiny achieves 873 Giga Operations Per Second (GOPS) by 902 DSPs with 67 Frames Per Second (FPS), and the computing efficiency in most layers can even reach more than 90%, which improves the calculation performance and efficiency comparing with the previous designs, and is significant to meet the increasing computing requirement.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have