Network Pruning for Bit-Serial Accelerators

Xiandong Zhao,Cheng Liu,Ying Wang,Lei Zhang,Cong Shi,Kaijie Tu

doi:10.1109/tcad.2022.3203955

Abstract

Bit-serial architectures (BSAs) are becoming increasingly popular in low-power neural network processor (NNP) designs for edge scenarios. However, the performance and energy efficiency of state-of-the-art BSA NNPs heavily depends on both the proportion and distribution of ineffectual weight bits in neural networks (NNs). To boost the performance of typical BSA accelerators, we present Bit-Pruner, a software approach to learn BSA-favored NNs without resorting to hardware modifications. Bit-Pruner not only progressively prunes but also restructures the non-zero bits in weights so that the number of non-zero bits in the model can be reduced and the corresponding computing can be load-balanced to suit the target BSA accelerators. On top of Bit-Pruner, we further propose a Pareto frontier optimization algorithm to adjust the bit-pruning rate across network layers and fulfill diverse NN processing requirements in terms of performance and accuracy for various edge scenarios. However, an aggressive Bit-Pruner can lead to non-trivial accuracy loss especially for lightweight NNs and complex tasks. To this end, alternating direction method of multipliers (ADMM) is adapted to the retraining phase in Bit-Pruner to smooth the abrupt disturbance due to bit-pruning and enhance the resulting model accuracy. According to the experiments, Bit-Pruner increases the bit-sparsity up to 94.4% with negligible accuracy degradation and achieves optimized trade-off between NN accuracy and energy efficiency even under very-aggressive performance constraints. When pruned models are deployed onto typical BSA accelerators, the average performance is 2.1X and 1.6X higher than the baseline networks without pruning and those with classical weight pruning, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Network Pruning for Bit-Serial Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Similar Papers

BitPruner: Network Pruning for Bit-serial Accelerators
Xiandong Zhao ... Cheng Liu
-
Xiandong Zhao, et. al.Xiandong Zhao ... Cheng Liu
01 Jul 2020
01 Jul 2020

Some observations on multiplierless implementation of linear phase FIR filters
M Bhattacharya ... T Saramaki
-
M Bhattacharya, et. al.M Bhattacharya ... T Saramaki
25 May 2003
25 May 2003

Some observations leading to multiplierless implementation of linear phase FIR filters
M Bhattacharya ... T Saramaki
-
M Bhattacharya, et. al.M Bhattacharya ... T Saramaki
06 Apr 2003
06 Apr 2003

A Multi Objective Genetic Algorithm based optimization of wavelet transform implementation
S Fakhfakh Ghribi ... N Derbel
-
S Fakhfakh Ghribi, et. al.S Fakhfakh Ghribi ... N Derbel
01 Dec 2008
01 Dec 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Network Pruning for Bit-Serial Accelerators

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems