Abstract

Convolutional neural networks (CNN) have been proved to be an effective method in the field of artificial intelligence (AI), and large-scale deploying CNN to embedded devices, no doubt, will greatly promote the development and application of AI into the practical industry. However, mainly due to the space-time complexity of CNN, computing power, memory bandwidth and flexibility are performance bottlenecks. In this paper, a framework containing model compression and hardware acceleration is proposed to solve the above problems. This framework consists of a mixed pruning method, data storage optimization for efficient memory utilization and an accelerator for mapping CNN on field programmable gate array (FPGA). The mixed pruning method is used to compress the model, and data bit-width is reduced to 8-bit by data quantization. Accelerator based on FPGA makes it flexible, configurable and efficient for CNN implementation. The model compression is evaluated on NVIDIA RTX2080Ti, and the results illustrate that the VGG16 is compressed by 30× and the fully convolutional network (FCN) is compressed by 11× within 1% accuracy loss. The compressed model is deployed and accelerated on ZCU102, which is up to 1.7× and 24.5× better in energy efficiency compared with RTX2080Ti and Intel i7 7700.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call