DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators

Yu Xing,Xin Liu,Yushun Wang,Xijie Jia,Yi Shan,Lingzhi Sui,Shuang Liang,Jiantao Qiu,Yu Wang

doi:10.1109/tcad.2019.2930577

Abstract

The convolutional neural network (CNN) has become a state-of-the-art method for several artificial intelligence domains in recent years. The increasingly complex CNN models are both computation-bound and I/O-bound. Field-programmable gate array-based accelerators driven by custom instruction set architecture (ISA) achieve a balance between generality and efficiency, but there is much on them left to be optimized. We propose the full-stack compiler deep neural network virtual machine (DNNVM), which is an integration of optimizers for graphs, loops and data layouts, an assembler, a runtime supporter, and a validation environment. The DNNVM works in the context of deep learning frameworks and transforms CNN models into the directed acyclic graph: XGraph. Based on XGraph, we transform the optimization challenges for both data layout and pipeline into graph-level problems. DNNVM enumerates all potentially profitable fusion opportunities by a heuristic subgraph isomorphism algorithm to leverage pipeline and data layout optimizations, and searches for the best choice of execution strategies of the whole computing graph. On the Xilinx ZU2@330 MHz and ZU9@330 MHz, we achieve equivalently state-of-the-art performance on our benchmarks by naive implementations without optimizations, and the throughput is further improved up to $1.26\times $ by leveraging heterogeneous optimizations in DNNVM. Finally, with ZU9@330 MHz, we achieve state-of-the-art performance for VGG and ResNet50. We achieve a throughput of 2.82 TOPs/s and an energy efficiency of 123.7 GOPs/s/W for VGG. Additionally, we achieve 1.38 TOPs/s for ResNet50 and 1.41 TOPs/s for GoogleNet.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Jul 30, 2019
Citations: 88

Similar Papers

DNNVM
Yu Xing ... Lingzhi Sui
-
Yu Xing, et. al.Yu Xing ... Lingzhi Sui
20 Feb 2019
20 Feb 2019

Tunnel boring machine vibration-based deep learning for the ground identification of working faces
Mengbo Liu ... Yongliang Huang
Journal of Rock Mechanics and Geotechnical Engineering | VOL. 13
Mengbo Liu, et. al.Mengbo Liu ... Yongliang Huang
01 Dec 2021
Journal of Rock Mechanics and Geotechnical Engineering | VOL. 13

Artificial intelligence: finding the intersection of predictive modeling and clinical utility
Karthik Ravi
Gastrointestinal Endoscopy | VOL. 93
Karthik RaviKarthik Ravi
07 Mar 2021
Gastrointestinal Endoscopy | VOL. 93

Hyperspectral signature-band extraction and learning: an example of sugar content prediction of Syzygium samarangense
Yung-Jhe Yan ... Mang Ou-Yang
Scientific Reports | VOL. 13
Yung-Jhe Yan, et. al.Yung-Jhe Yan ... Mang Ou-Yang
12 Sep 2023
Scientific Reports | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems