An Efficient Hardware Design for Accelerating Sparse CNNs With NAS-Based Models

Yun Liang,Jiaming Xie,Liqiang Lu,Wei Lin,Ruirui Huang,Yicheng Jin,Jiansong Zhang

doi:10.1109/tcad.2021.3066563

Abstract

Deep convolutional neural networks (CNNs) have achieved remarkable performance at the cost of huge computation. As the CNN models become more complex and deeper, compressing CNNs to sparse by pruning the redundant connection in the networks has emerged as an attractive approach to reduce the amount of computation and memory requirement. On the other hand, FPGAs have been demonstrated to be an effective hardware platform to accelerate CNN inference. However, most existing FPGA accelerators focus on dense CNN models, which are inefficient when executing sparse models as most of the arithmetic operations involve addition and multiplication with zero operands. In this work, we propose an accelerator with software–hardware co-design for sparse CNNs on FPGAs. To efficiently deal with the irregular connections in the sparse convolutional layers, we propose a weight-oriented dataflow that exploits element–matrix multiplication as the key operation. Each weight is processed individually, which yields low decoding overhead. Then, we design an FPGA accelerator that features a tile look-up table (TLUT) and a channel multiplexer (CMUX). The TLUT is designed to match the index between sparse weights and input pixels. Using TLUT, the runtime decoding overhead is mitigated by using an efficient indexing operation. Moreover, we propose a weight layout to enable efficient on-chip memory access without conflicts. To cooperate with the weight layout, a CMUX is inserted to locate the address. Finally, we build a neural architecture search (NAS) engine that leverages the reconfigurability of FPGAs to generate an efficient CNN model and choose the optimal hardware design parameters. The experiments demonstrate that our accelerator can achieve 223.4-309.0 GOP/s for the modern CNNs on Xilinx ZCU102, which provides a <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2.4\times $ </tex-math></inline-formula> – <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$12.9\times $ </tex-math></inline-formula> speedup over previous dense CNN accelerators on FPGAs. Our FPGA-aware NAS approach shows <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2\times $ </tex-math></inline-formula> speedup over MobileNetV2 with 1.5% accuracy loss.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Mar 1, 2022
Citations: 16	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

An Efficient Hardware Design for Accelerating Sparse CNNs With NAS-Based Models

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Similar Papers

An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs
Yun Liang ... Ruirui Huang
-
Yun Liang, et. al.Yun Liang ... Ruirui Huang
01 Apr 2019
01 Apr 2019

Manas
Hridesh Rajan ... Giang Nguyen
-
Hridesh Rajan, et. al.Hridesh Rajan ... Giang Nguyen
21 May 2022
21 May 2022

Sparse-PE: A Performance-Efficient Processing Engine Core for Sparse Convolutional Neural Networks
Arslan Munir ... Mahmood Azhar Qureshi
IEEE Access | VOL. 9
Arslan Munir, et. al.Arslan Munir ... Mahmood Azhar Qureshi
01 Jan 2020
IEEE Access | VOL. 9

Artificial intelligence: finding the intersection of predictive modeling and clinical utility
Karthik Ravi
Gastrointestinal Endoscopy | VOL. 93
Karthik RaviKarthik Ravi
07 Mar 2021
Gastrointestinal Endoscopy | VOL. 93

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Efficient Hardware Design for Accelerating Sparse CNNs With NAS-Based Models

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems