Automatic Compilation of Diverse CNNs Onto High-Performance FPGA Accelerators

Yufei Ma,Jae-Sun Seo,Yu Cao,Sarma Vrudhula

doi:10.1109/tcad.2018.2884972

Abstract

A broad range of applications are increasingly benefiting from the rapid and flourishing development of convolutional neural networks (CNNs). The FPGA-based CNN inference accelerator is gaining popularity due to its high-performance and low-power as well as FPGA’s conventional advantage of reconfigurability and flexibility. Without a general compiler to automate the implementation, however, significant efforts and expertise are still required to customize the design for each CNN model. In this paper, we present an register-transfer level (RTL)-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGA and still keep the benefits of low-level hardware optimization. First, a general-purpose library of RTL modules is developed to model different operations at each layer. The integration and dataflow of physical modules are predefined in the top-level system template and reconfigured during compilation for a given CNN algorithm. The runtime control of layer-by-layer sequential computation is managed by the proposed execution schedule so that even highly irregular and complex network topology, e.g., GoogLeNet and ResNet, can be compiled. The proposed methodology is demonstrated with various CNN algorithms, e.g., NiN, VGG, GoogLeNet, and ResNet, on two standalone Intel FPGAs, Arria 10, and Stratix 10, achieving end-to-end inference throughputs of 969 GOPS and 1604 GOPS, respectively, with batch size of one.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Feb 1, 2020
Citations: 69	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

Automatic Compilation of Diverse CNNs Onto High-Performance FPGA Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Similar Papers

An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks
Yufei Ma ... Yu Cao
-
Yufei Ma, et. al.Yufei Ma ... Yu Cao
01 Sep 2017
01 Sep 2017

Computed Tomography Image Based on Intelligent Segmentation Algorithm in the Diagnosis of Ovarian Tumor
Ling Zhu ... Yucheng He
Scientific Programming | VOL. 2021
Ling Zhu, et. al.Ling Zhu ... Yucheng He
13 Nov 2021
Scientific Programming | VOL. 2021

Application of CNN Algorithm Based on Chaotic Recursive Diagonal Model in Medical Image Processing.
Fangfang Ye ... Ting Wang
Computational Intelligence and Neuroscience | VOL. 2021
Fangfang Ye, et. al.Fangfang Ye ... Ting Wang
01 Jan 2020
Computational Intelligence and Neuroscience | VOL. 2021

Single-Photon Emission Computed Tomography Image-Assisted Diagnosis of Thyroid Diseases under Convolutional Network Neural Algorithm
Shaobo Chen ... Haiyan Gong
Scientific Programming | VOL. 2021
Shaobo Chen, et. al.Shaobo Chen ... Haiyan Gong
16 Dec 2021
Scientific Programming | VOL. 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Compilation of Diverse CNNs Onto High-Performance FPGA Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems