OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs

Yun Liang,Liqiang Lu,Jiaming Xie

doi:10.1109/tcad.2020.3023903

Abstract

Convolution neural networks (CNNs) as one of today's main flavor of deep learning techniques dominate in various image recognition tasks. As the model size of modern CNNs continues to grow, neural network compression techniques have been proposed to prune the redundant neurons and synapses. However, prior techniques disconnect the software neural networks compression and hardware acceleration, which fail to balance multiple design parameters, including sparsity, performance, hardware area cost, and efficiency. More concretely, prior unstructured pruning techniques achieve high sparsity at the expense of extra performance overhead, while prior structured pruning techniques relying on strict sparse patterns lead to low sparsity and extra hardware cost. In this article, we propose OMNI, a framework for accelerating sparse CNNs on hardware accelerators. The innovation of OMNI stems from that it uses hardware amenable on-chip memory partition patterns to seamlessly engage the software CNN model compression and hardware CNN acceleration. To accelerate the compute-intensive convolution kernel, a promising hardware optimization approach is memory partition, which divides the original weight kernels into several groups so that the different hardware processing elements can simultaneously access the weight. We exploit the memory partition patterns including block, cyclic, or hybrid as a means of CNN compression patterns. Our software CNN model compression balances the sparsity across different groups and our hardware accelerator employs hardware parallelization coordinately with the sparse patterns, leading to a desirable compromise between sparsity and performance. We further develop performance models to help the designers to quickly identify the pattern factors subject to an area constraint. Last, we evaluate our design on application specific integrated circuit (ASIC) and field-programmable gate array (FPGA) platform. Experiments demonstrate that OMNI achieves 3.4×- 6.2× speedup for the modern CNNs, over a comparably ideal dense CNN accelerator. OMNI shows 114.7× energy efficiency improvement compared with GPU platform. OMNI is also evaluated on Xilinx ZC706 and ZCU102 FPGA platforms, achieving 41.5 GOP/s and 125.3 GOP/s, respectively.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Sep 14, 2020
Citations: 24	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Similar Papers

Systematic analysis of FPGA-based hardware accelerators for convolutional neural networks
Fangrong Zhang
Applied and Computational Engineering | VOL. 53
Fangrong ZhangFangrong Zhang
28 Mar 2024
Applied and Computational Engineering | VOL. 53

An FPGA-SoC based Hardware Acceleration of Convolutional Neural Networks
Soulef Bouaafia ... Fatma Ezahra Sayadi
-
Soulef Bouaafia, et. al.Soulef Bouaafia ... Fatma Ezahra Sayadi
28 May 2022
28 May 2022

Reconfigurable Network-on-Chip based Convolutional Neural Network Accelerator
Ahmad Khademzadeh ... Mehdi Modarressi
Journal of Systems Architecture | VOL. 129
Ahmad Khademzadeh, et. al.Ahmad Khademzadeh ... Mehdi Modarressi
23 May 2022
Journal of Systems Architecture | VOL. 129

An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
Oguz Ergin ... Osman Unsal
-
Oguz Ergin, et. al.Oguz Ergin ... Osman Unsal
01 Jun 2020
01 Jun 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

OMNI: A Framework for Integrating Hardware and Software Optimizations for Sparse CNNs

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems