Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation

Xizi Chen,Chi-Ying Tsui,Jingbo Jiang,Jingyang Zhu

doi:10.1109/tcad.2022.3178047

Abstract

The unstructured sparsity after pruning poses a challenge to the efficient implementation of deep learning models in existing regular architectures like systolic arrays. On the other hand, coarse-grained structured pruning is suitable for implementation in regular architectures but tends to have higher accuracy loss than unstructured pruning when the pruned models are of the same size. In this work, we propose a model compression method based on a novel weight permutation scheme to fully exploit the fine-grained weight sparsity in the hardware design. Through permutation, the optimal arrangement of the weight matrix is obtained, and the sparse weight matrix is further compressed to a small and dense format to make full use of the hardware resources. Two pruning granularities are explored. In addition to the unstructured weight pruning, we also propose a more fine-grained subword-level pruning to further improve the compression performance. Compared to the state-of-the-art works, the matrix compression rate is significantly improved from <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$5.88\times $ </tex-math></inline-formula> to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$14.13\times $ </tex-math></inline-formula> . As a result, the throughput and energy efficiency are improved by 2.75 and 1.86 times, respectively.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Feb 1, 2023
Citations: 3

Similar Papers

Tight Compression: Compressing CNN Model Tightly Through Unstructured Pruning and Simulated Annealing Based Permutation
Xizi Chen ... Jingbo Jiang
-
Xizi Chen, et. al.Xizi Chen ... Jingbo Jiang
01 Jul 2020
01 Jul 2020

An Efficient CNN Training Accelerator Leveraging Transposable Block Sparsity
Mingyang Xu ... Jinming Lu
-
Mingyang Xu, et. al.Mingyang Xu ... Jinming Lu
13 Jun 2022
13 Jun 2022

Optimization on parametric model
Fenfen Huang ... Wenbin Yao
-
Fenfen Huang, et. al.Fenfen Huang ... Wenbin Yao
01 Apr 2018
01 Apr 2018

Hardware-Software Codesign of Weight Reshaping and Systolic Array Multiplexing for Efficient CNNs
Jingyao Zhang ... Grace Li Zhang
-
Jingyao Zhang, et. al.Jingyao Zhang ... Grace Li Zhang
01 Feb 2021
01 Feb 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems