Abstract

Nowadays, CNNs has delivered the state-of-the-art performance in the field of computer vision, image classification, etc. As CNNs going deeper, it becomes more difficult to implement CNNs applications based on general-purpose computing platforms. Recently, many FPGA-based CNNs accelerators have been proposed, these accelerators achieved high performance on specific CNNs models, however they are somewhat lack of reconfigurability to fit different applications. To deal with this problem, an end-to-end acceleration framework was proposed in this paper, which consists of a parameterized hardware accelerator and a fully automatic software framework. Parallel computation and pipeline optimization are deployed in the hardware design to achieve high performance. Simultaneously, runtime reconfigurability is implemented by using a global register list. By encapsulating the underlying driver, a three-layer software framework is provided for users to deploy their pre-trained models. A typical CNNs model used for handwritten digital recognition was selected to test and verify the accelerator. The experimental result shows that the accelerator can reach a recognition speed of 22.65FPS under the clock frequency of 100MHz, comparing with ARM Cortex-A9 working at 650MHz, it can achieve 25.9 times of acceleration effect, with only 1.59W power consumption.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.