Abstract

In this study, we will explore Neural Network based FPGA acceleration based on accelerating General Matrix Multiplication (GEMM). GEMM acceleration allows regularized and modular implementation of accelerator design, as well as providing the benefits of scalability. GEMM based designs also offer a degree of functional flexibility which is a key benefit to understand the highly dynamic architectural developments in Deep Learning algorithms. We quantify the theoretical performance model and tradeoffs of a GEMM accelerator along with exploration of the design space. Moreover, we propose a design for an accelerator exploiting 8-bit quantization to increase bandwidth while preserving model accuracy, exploiting FPGAs for model parallelization and data re-use for high performance and low latency neural network inference. Lastly, we test and evaluate our design on the MNIST dataset. The proposed method is useful to optimize the hardware area in Deep Learning systems without sacrificing performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call