Abstract
Motivated by the fact that uniform quantization is not suitable for signals having non-uniform probability density functions (pdfs), as the Laplacian pdf is, in this paper we have divided the support region of the quantizer into two disjunctive regions and utilized the simplest uniform quantization with equal bit-rates within both regions. In particular, we assumed a narrow central granular region (CGR) covering the peak of the Laplacian pdf and a wider peripheral granular region (PGR) where the pdf is predominantly tailed. We performed optimization of the widths of CGR and PGR via distortion optimization per border–clipping threshold scaling ratio which resulted in an iterative formula enabling the parametrization of our piecewise uniform quantizer (PWUQ). For medium and high bit-rates, we demonstrated the convenience of our PWUQ over the uniform quantizer, paying special attention to the case where 99.99% of the signal amplitudes belong to the support region or clipping region. We believe that the resulting formulas for PWUQ design and performance assessment are greatly beneficial in neural networks where weights and activations are typically modelled by the Laplacian distribution, and where uniform quantization is commonly used to decrease memory footprint.
Highlights
One of the growing interests in neural networks (NNs) is directed towards the efficient representation of weights and activations by means of quantization [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
To improve upon the uniform quantizer model in terms of SQNR and to utilize the benefits that are provided by the simplest UQ, in this paper we have proposed one piecewise uniform quantizer (PWUQ)
This model deliberately applies equal bit-rates uniform quantization in regions called central granular region (CGR) and peripheral granular region (PGR) whose widths are optimized in an iterative manner so that for the assumed clipping thresholds and Laplacian pdf, the distortion is minimal
Summary
One of the growing interests in neural networks (NNs) is directed towards the efficient representation of weights and activations by means of quantization [1,2,3,4,5,6,7,8,9,10,11,12,13,14]. Standard implementation of NNs supposes 32-bits full-precision (FP32) representation of NN parameters, requiring complex and expensive hardware. By quantizing FP32 weights and activations with low-bits, that is, by thoughtfully choosing a quantizer model for NN parameters, one can significantly reduce the required bit-width for the digital representation of NN parameters, greatly reducing the overall complexity of the NN while degrading the network accuracy to some extent [2,3,5,6,8,9]. A few of new quantizer models and quantization methodologies have been proposed, for instance in [4,5,11,13], with a main objective—to enable quantized NNs to have the slightly degraded or almost the same accuracy level as their full-precision counterparts
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have