Abstract

A detailed methodology for implementing a fully connected (FC) deep neural network (DNN) and convolutional neural network (CNN) inference system on a field programming gate array (FPGA) is presented. Minimal computational units are used for the DNN. For the CNN, systolic array (SA) architecture endowed with parallel processing potential is utilized. Algorithmic analysis determines the optimum memory requirement for the fixed point trained parameters. The size of the trained parameters and the available memory on the target FPGA device govern the choice of on-chip memory to utilize. Experimental results indicate that the choice of block over distributed memory saves ≈62% look-up-tables (LUTs) for the DNN ([784-512-512-10]), and the choice of distributed over block memory saves ≈30% block random access memory (BRAM) for the LeNet-5 CNN unit. This study provides insights for developing FPGA-based digital systems for applications requiring DNN and CNN.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.