WORDA: A Winograd Offline-Runtime Decomposition Algorithm for Faster CNN Inference

Jacob Nelson,Tolulope Odetola,Syed Rafay Hasan

doi:10.1109/mwscas47672.2021.9531783

Abstract

Convolutional Neural Networks (CNNs) have demonstrated impressive performance in recent times and have shown a wide range of applicability. The deployment of CNNs on resource-constrained edge devices for inference still proves challenging due to the computation, memory, energy, and band-width requirements of CNNs. To address these issues, FPGAs are commonly used to implement CNNs because of their high flexibility and low power consumption. The Winograd convolution algorithm can be used to further reduce the computation requirements of a convolution operation. This paper proposes the Winograd Offline-Runtime Decomposition Algorithm (WORDA), which provides an efficient approach to performing Winograd convolution to achieve low computation latency. In this work, WORDA is used to design convolution layers for CNN accelerators on FPGA for two CNN architectures, namely LeNet and AlexNet, using Vivado HLS (High Level Synthesis). The state-of-the-art comparison shows a 58.3% decrease in latency while only incurring a constant increase in BRAMs, no change in DSPs, and a 122% increase in flip-flops (FFs) and lookup tables (LUTs) usage when using filters of size 5 × 5.

Full Text