Efficient CNN Accelerator on FPGA

S Kala,S Nalesh

doi:10.1080/03772063.2020.1821797

Abstract

Convolutional neural networks (CNNs) are classical models for computer vision and machine learning applications such as video surveillance, pattern recognition, weather forecasting, traffic, and safety. CNNs involve computationally intensive operations and require huge off-chip memory bandwidth, which makes it a challenging task to deploy on real-time embedded systems. Compared to central processing units and graphic processing units, field programmable gate arrays (FPGA)-based CNNs are gaining popularity owing to their flexibility and efficiency. In this work, we present an efficient CNN accelerator based on blocked Winograd-GEMM architecture with high performance. We implement ResNet-18 CNN model on XC7VX690T FPGA using proposed architecture. This implementation operates at a clock frequency of 200 MHz and gives average throughput of 383 GOPS which is comparable to other state-of-art implementations. This manuscript is an extended version of [S. Kala, J. Mathew, B. R. Jose, and S. Nalesh, “UniWiG: Unified Winograd-GEMM Architecture for Accelerating CNN on FPGAs,” in 2019 32nd International Conference on VLSI Design and 2019 18th International Conference on Embedded Systems (VLSID), Delhi, NCR, India, 2019, pp. 209–214. DOI: 10.1109/VLSID.2019.00055.].

Full Text