Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks

Li Yang,Deliang Fan,Zhezhi He

doi:10.1609/aaai.v34i04.6138

Abstract

Deep convolutional neural network (DNN) has demonstrated phenomenal success and been widely used in many computer vision tasks. However, its enormous model size and high computing complexity prohibits its wide deployment into resource limited embedded system, such as FPGA and mGPU. As the two most widely adopted model compression techniques, weight pruning and quantization compress DNN model through introducing weight sparsity (i.e., forcing partial weights as zeros) and quantizing weights into limited bit-width values, respectively. Although there are works attempting to combine the weight pruning and quantization, we still observe disharmony between weight pruning and quantization, especially when more aggressive compression schemes (e.g., Structured pruning and low bit-width quantization) are used. In this work, taking FPGA as the test computing platform and Processing Elements (PE) as the basic parallel computing unit, we first propose a PE-wise structured pruning scheme, which introduces weight sparsification with considering of the architecture of PE. In addition, we integrate it with an optimized weight ternarization approach which quantizes weights into ternary values ({-1,0,+1}), thus converting the dominant convolution operations in DNN from multiplication-and-accumulation (MAC) to addition-only, as well as compressing the original model (from 32-bit floating point to 2-bit ternary representation) by at least 16 times. Then, we investigate and solve the coexistence issue between PE-wise Structured pruning and ternarization, through proposing a Weight Penalty Clipping (WPC) technique with self-adapting threshold. Our experiment shows that the fusion of our proposed techniques can achieve the best state-of-the-art ∼21× PE-wise structured compression rate with merely 1.74%/0.94% (top-1/top-5) accuracy degradation of ResNet-18 on ImageNet dataset.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Apr 3, 2020
Citations: 23

Similar Papers

QL-Net: Quantized-by-LookUp CNN
Kamila Abdiyeva ... Narendra Ahuja
-
Kamila Abdiyeva, et. al.Kamila Abdiyeva ... Narendra Ahuja
01 Nov 2018
01 Nov 2018

Constructing convolutional neural network by utilizing nematode connectome: A brain-inspired method
Dan Su ... Long Jin
Applied Soft Computing Journal | VOL. 149
Dan Su, et. al.Dan Su ... Long Jin
30 Oct 2023
Applied Soft Computing Journal | VOL. 149

Research on improved convolutional wavelet neural network
Jingwei Liu ... Peixuan Li
Scientific Reports | VOL. 11
Jingwei Liu, et. al.Jingwei Liu ... Peixuan Li
09 Sep 2021
Scientific Reports | VOL. 11

Deep network for retinal disease classification based on limited clinical OCT angiography datasets (Conference Presentation)
Yuxuan Cheng ... Arthur Ho
-
Yuxuan Cheng, et. al.Yuxuan Cheng ... Arthur Ho
14 Mar 2018
14 Mar 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence