Systematic realization of a fully connected deep and convolutional neural network architecture on a field programmable gate array

Anand Kumar Mukhopadhyay,Sampurna Majumder,Indrajit Chakrabarti

doi:10.1016/j.compeleceng.2021.107628

Anand Kumar Mukhopadhyay, Sampurna Majumder + Show 1 more

Open Access

https://doi.org/10.1016/j.compeleceng.2021.107628

Copy DOI

Abstract

A detailed methodology for implementing a fully connected (FC) deep neural network (DNN) and convolutional neural network (CNN) inference system on a field programming gate array (FPGA) is presented. Minimal computational units are used for the DNN. For the CNN, systolic array (SA) architecture endowed with parallel processing potential is utilized. Algorithmic analysis determines the optimum memory requirement for the fixed point trained parameters. The size of the trained parameters and the available memory on the target FPGA device govern the choice of on-chip memory to utilize. Experimental results indicate that the choice of block over distributed memory saves ≈62% look-up-tables (LUTs) for the DNN ([784-512-512-10]), and the choice of distributed over block memory saves ≈30% block random access memory (BRAM) for the LeNet-5 CNN unit. This study provides insights for developing FPGA-based digital systems for applications requiring DNN and CNN.

Full Text