Optimization of Linear Quantization for General and Effective Low Bit-Width Network Compression

Wenxin Yang,Weiqin Tong,Xiaoli Zhi

doi:10.3390/a16010031

Abstract

Current edge devices for neural networks such as FPGA, CPLD, and ASIC can support low bit-width computing to improve the execution latency and energy efficiency, but traditional linear quantization can only maintain the inference accuracy of neural networks at a bit-width above 6 bits. Different from previous studies that address this problem by clipping the outliers, this paper proposes a two-stage quantization method. Before converting the weights into fixed-point numbers, this paper first prunes the network by unstructured pruning and then uses the K-means algorithm to cluster the weights in advance to protect the distribution of the weights. To solve the instability problem of the K-means results, the PSO (particle swarm optimization) algorithm is exploited to obtain the initial cluster centroids. The experimental results on baseline deep networks such as ResNet-50, Inception-v3, and DenseNet-121 show the proposed optimized quantization method can generate a 5-bit network with an accuracy loss of less than 5% and a 4-bit network with only 10% accuracy loss as compared to 8-bit quantization. By quantization and pruning, this method reduces the model bit-width from 32 to 4 and the number of neurons by 80%. Additionally, it can be easily integrated into frameworks such as TensorRt and TensorFlow-Lite for low bit-width network quantization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimization of Linear Quantization for General and Effective Low Bit-Width Network Compression

Abstract

Talk to us

Similar Papers

More From: Algorithms

Lead the way for us

Journal: Algorithms	Publication Date: Jan 4, 2023
License type: CC BY 4.0

Similar Papers

A Performance Comparison of PSO based MPPT Algorithms for Various Partial Shading Conditions
R Subha ... S Himavathi
Indian Journal of Science and Technology | VOL. 9
R Subha, et. al.R Subha ... S Himavathi
20 Dec 2016
Indian Journal of Science and Technology | VOL. 9

Hardware/Software co-design SoC-system for a neural network trained by particle swarm optimization
Yukinobu Hoshino
-
Yukinobu HoshinoYukinobu Hoshino
01 Nov 2017
01 Nov 2017

Research on Particle Swarm Optimsiation and its Application in Neural Network
Yue Li Li ... Shu Hui Chang
Applied Mechanics and Materials | VOL. 556-562
Yue Li Li, et. al.Yue Li Li ... Shu Hui Chang
01 May 2014
Applied Mechanics and Materials | VOL. 556-562

Hardware Software Partitioning using Particle Swarm Optimization Technique
M B Abdelhalim ... A E Salama
-
M B Abdelhalim, et. al.M B Abdelhalim ... A E Salama
01 Dec 2006
01 Dec 2006

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimization of Linear Quantization for General and Effective Low Bit-Width Network Compression

Abstract

Talk to us

Similar Papers

More From: Algorithms