Abstract

Convolution is the most important operation in convolutional neural networks (CNN). FPGA-based CNN accelerators need to fully consider the optimization of convolution loops to get ideal performance. This work analyzes convolution loop optimization in detail, exploiting loop tiling, loop unrolling, and loop interchange to design the dataflow of accelerator. This work quantitatively evaluates strategies for data reuse and resource utilization, combining fixed and dynamic parallelism to design a high-performance adaptive accelerator. The proposed accelerator is evaluated on ZCU102 FPGA by implementing a five-layer CNN with large differences in convolution layer sizes. It achieves more than 1.14x improvement in throughput efficiency over prior accelerators. And the consumption of logic resources is less than half of prior accelerators while the computing resources are similar.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call