Abstract

Convolutional Neural Network (CNN) is a special kind of feed - forward Artificial Neural Network that is generally used for fast and accurate image recognition. This capability is highly required in the field of embedded systems for various applications. Embedded systems present a compelling need for a portable, low power and area - efficient hardware accelerator. Also, the large amount of processing needed by CNN demands dedicated and custom-built hardware implementations. The convolution part of CNN is a highly parallelized Digital Signal Processing (DSP) algorithm which makes it fit for Field Programmable Gate Array (FPGA) implementation as FPGAs have an incontestable ability to maximize parallelism. In this paper, we present the implementation of CNN on FPGA. We came up with a smaller version of LeNet - 5 with a parametric reduction of about 95% and having accuracy of 95.33% for the application of digit recognition. The complete modified architecture is implemented using Hardware Description Language - Verilog with the aim of improving the timing performance in the inference phase. The proposed work is compared with a software implementation on an 8th generation i5 processor using Keras. The results obtained clearly demarcate acceleration. The hardware architecture is designed to fit on Kintex - 7 xck325ttfg900-2l FPGA optimally. The obtained results can be easily extrapolated for an improved architecture showcasing four times more parallelism for FPGAs having more DSP slices (e.g. Virtex 7 series).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call