High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture

S Kala,Jimson Mathew,S Nalesh,Babita R Jose

doi:10.1109/tvlsi.2019.2941250

S Kala, Jimson Mathew + Show 2 more

https://doi.org/10.1109/tvlsi.2019.2941250

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Deep neural networks have revolutionized a variety of applications in varying domains like autonomous vehicles, weather forecasting, cancer detection, surveillance, traffic management, and so on. The convolutional neural network (CNN) is the state-of-the-art technique for many machine learning tasks in the image and video processing domains. Deployment of CNNs on embedded systems with lower processing power and smaller power budget is a challenging task. Recent studies have shown the effectiveness of field-programmable gate array (FPGA) as a hardware accelerator for the CNNs that can deliver high performance at low power budgets. Majority of computations in CNNs involve 2-D convolution. Winograd minimal filtering-based algorithm is the most efficient technique for calculating convolution for smaller filter sizes. CNNs also consist of fully connected layers that are computed using general element-wise matrix multiplication (GEMM). In this article, we propose a unified architecture named UniWiG, where both Winograd-based convolution and GEMM can be accelerated using the same set of processing elements. This approach leads to efficient utilization of FPGA hardware resources while computing all layers in the CNN. The proposed architecture shows performance improvement in the range of $1.4\times $ to $4.02\times $ with only 13% additional FPGA resources with respect to the baseline GEMM-based architecture. We have mapped popular CNN models like AlexNet and VGG-16 onto the proposed accelerator and the measured performance compares favorably with other state-of-the-art implementations. We have also analyzed the vulnerability of the accelerator to the side-channel attacks. Preliminary investigations show that the UniWiG architecture is more robust to memory side-channel attacks than direct convolution-based techniques.

Full Text