Abstract

This paper presents a platform that automatically generates custom hardware accelerators for convolutional neural networks (CNNs) implemented in field-programmable gate array (FPGA) devices. It includes a user interface for configuring and managing these accelerators. The herein-presented platform can perform all the processes necessary to design and test CNN accelerators from the CNN architecture description at both layer and internal parameter levels, training the desired architecture with any dataset and generating the configuration files required by the platform. With these files, it can synthesize the register-transfer level (RTL) and program the customized CNN accelerator into the FPGA device for testing, making it possible to generate custom CNN accelerators quickly and easily. All processes save the CNN architecture description are fully automatized and carried out by the platform, which manages third-party software to train the CNN and synthesize and program the generated RTL. The platform has been tested with the implementation of some of the CNN architectures found in the state-of-the-art for freely available datasets such as MNIST, CIFAR-10, and STL-10.

Highlights

  • In recent years, the field of computer vision has evolved dramatically, mainly due to the promising practical results obtained when using deep neural networks, such as convolutional neural networks (CNN), for typical machine vision tasks such as image classification [1], object detection [2], and tracking [3]

  • The main purpose of the proposed platform is to generate CNN accelerators from custom and conventional CNN models automatically using highly configurable hardware templates, which is conventional CNN models automatically using highly configurable hardware templates, which is demonstrated by generating some low-end conventional and custom CNNs models for freely available demonstrated by generating some low-end conventional and custom CNNs models for freely datasets, such as, MNIST, CIFAR-10, and STL-10

  • The platform was tested on a PC with HDD, i5-8300H central processing units (CPUs) @ 2.3 GHz, 8 GB DRAM @2667 MHz, and a 4 GB Nvidia 1050 graphics processing units (GPUs)

Read more

Summary

Introduction

The field of computer vision has evolved dramatically, mainly due to the promising practical results obtained when using deep neural networks, such as convolutional neural networks (CNN), for typical machine vision tasks such as image classification [1], object detection [2], and tracking [3]. CNNs are often implemented using graphics processing units (GPUs) because of their high computing capability and the availability of automated tools (e.g., Tensorflow, Matlab, and Caffe) for their generation due to their usability for creating and implementing customized CNN architectures in multiple devices, such as GPUs and central processing units (CPUs). GPU overcomes CPU computing throughput, it is characterized by its very high power consumption, making it infeasible for embedded applications, which are very common in the field of computer vision. Field-programmable gate array (FPGA) CNN implementations have proven to be faster than some high-end GPUs while maintaining higher power efficiency as stated. Electronics 2019, 8, 641 in [4], taking advantage of the configurability and the capability to generate massively parallel and pipelined systems using FPGA devices.

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call