Abstract

The development of Convolutional Neural Network (CNN) contributes to breakthroughs made in the field of artificial intelligence. Compared with traditional algorithms, CNN has merits in speed and accuracy concerning detection, identification and classification. GPU is of great popularity for implementing CNN on account of its computational capacity. However, its high power consumption limits the application in the embedded field. Recently, researchers accelerate CNN utilizing Field Programmable Gate Arrays (FPGA) which is demonstrated more energy-efficient than GPU, and is suitable for the applications of embedded systems. Although FPGA has the superiority in low power consumption, powerful parallel computing and high flexibility, bandwidth and memory accessing become the bottleneck of CNN accelerator design. In this paper, a novel memory-optimized and energy-efficient CNN accelerating architecture is proposed. The paper analyzes the on-chip memory and off-chip memory resources of FPGA, and proposes a memory optimization solution using specially mixed operation of FIFO and ping-pong. To ensure accuracy, a folat-16 CNN model is used to test the framework, and evaluated on Xilinx ZCU102 platform which has both Arm-Core and FPGA on one chip. After testing the VGG-16 Net and a FCN Net with 500MB weights, the architecture is 10 times faster than CPU, and has better energy-efficiency than GPU does.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call