Abstract

Nowadays, CNNs has delivered the state-of-the-art performance in the field of computer vision, image classification, etc. As CNNs going deeper, it becomes more difficult to implement CNNs applications based on general-purpose computing platforms. Recently, many FPGA-based CNNs accelerators have been proposed, these accelerators achieved high performance on specific CNNs models, however they are somewhat lack of reconfigurability to fit different applications. To deal with this problem, an end-to-end acceleration framework was proposed in this paper, which consists of a parameterized hardware accelerator and a fully automatic software framework. Parallel computation and pipeline optimization are deployed in the hardware design to achieve high performance. Simultaneously, runtime reconfigurability is implemented by using a global register list. By encapsulating the underlying driver, a three-layer software framework is provided for users to deploy their pre-trained models. A typical CNNs model used for handwritten digital recognition was selected to test and verify the accelerator. The experimental result shows that the accelerator can reach a recognition speed of 22.65FPS under the clock frequency of 100MHz, comparing with ARM Cortex-A9 working at 650MHz, it can achieve 25.9 times of acceleration effect, with only 1.59W power consumption.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call