Abstract

Convolutional neural network (CNN) has presented a great success in numerous areas and has sparked an increasing interest in accelerating CNN using hardware like FPGAs. However, efficient FPGA design for CNN applications requires a long development time and a strong background in hardware details. Consequently, an easy-to-use yet powerful auto CNN design optimization framework is required. In this work, we propose a collaborative framework to model and optimize the OpenCL based FPGA design for CNN applications according to the device resource limitation and the CNN specification. Our framework mainly consists of LoopTree, a novel data structure we propose to capture the structure of OpenCL based CNN design; a LoopTree based coarse-grained model, which will estimate the performance of the CNN design at the module level; and a source code based fine-grained model, which will estimate the CNN design performance in a cycle-accurate manner. Efficient designs can be achieved by collaborating the two models in a search and refined manner. A variety of OpenCL based designs have been implemented on board to verify our framework. The results show that our coarse-grained model and fine-grained model have an average estimation error of 10.2% and 4.7% which are much lower than prevalent operation statistics based estimation calculated by the predefined formula for specific loop schedules.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call