Abstract

In recent years, there have been more and more studies on Post-Training Quantization (PTQ). Many outstanding works have emerged, which has greatly promoted the availability of PTQ methods. However, in terms of low-bit quantization, there is still a wide gap between PTQ and the current state-of-the-art Quantization-Aware Training (QAT) methods. In this work, we find that the current way of obtaining the activation scale is not completely reasonable in PTQ, which leads to the fact that the weight cannot be well adapted to the biased quantized activation during inference. Based on experiments and analysis, we propose a method called StablePTQ. It obtains a stable activation scale and mixes rich input into the block reconstruction to achieve an improving quantization accuracy. StablePTQ achieves remarkable improvements in several different bit quantization especially W2A4, and can be applied to other quantization algorithms as a plug-and-play approach. The source code of StablePTQ is avaiable at .

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.