Abstract

Convolutional neural networks (CNNs) are widely used in modern applications for their versatility and high classification accuracy. Field-programmable gate arrays (FPGAs) are considered to be suitable platforms for CNNs based on their high performance, rapid development, and reconfigurability. Although many studies have proposed methods for implementing high-performance CNN accelerators on FPGAs using optimized data types and algorithm transformations, accelerators can be optimized further by investigating more efficient uses of FPGA resources. In this paper, we propose an FPGA-based CNN accelerator using multiple approximate accumulation units based on a fixed-point data type. We implemented the LeNet-5 CNN architecture, which performs classification of handwritten digits using the MNIST handwritten digit dataset. The proposed accelerator was implemented, using a high-level synthesis tool on a Xilinx FPGA. The proposed accelerator applies an optimized fixed-point data type and loop parallelization to improve performance. Approximate operation units are implemented using FPGA logic resources instead of high-precision digital signal processing (DSP) blocks, which are inefficient for low-precision data. Our accelerator model achieves 66% less memory usage and approximately 50% reduced network latency, compared to a floating point design and its resource utilization is optimized to use 78% fewer DSP blocks, compared to general fixed-point designs.

Highlights

  • In many modern applications, convolutional neural networks (CNNs) are adopted for image classification based on their high versatility and accuracy

  • A CNN is a type of deep neural network (DNN) that utilizes a convolution algorithm based on a 2D array of inputs

  • The results on the maximum operating frequency, resource utilization, and power consumption are derived from postimplementation reports and the power estimation report taken from the Vivado tool

Read more

Summary

Introduction

Convolutional neural networks (CNNs) are adopted for image classification based on their high versatility and accuracy. Traditional CNNs use high-precision floating-point data types for both training and inference, but many recent studies have explored more efficient data types by reducing data sizes and applying quantization [9,10,11,12,13,14]. Many studies have proven that CNNs can achieve improvements in performance and resource utilization by using low-precision data without a significant loss of classification accuracy. The output is the sum of multiplied inputs and weights, similar to a traditional DNN, a CNN uses kernels. Each output node is the sum of overlapping input feature maps and kernels. This is known as a shared-weight scheme. This allows a CNN to reduce the number of trainable parameters significantly while accelerating network training and inference

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call