Abstract

Neural network quantization aims to reduce the model size, computational complexity, and memory consumption by mapping weights and activations from full-precision to low-precision. However, many existing quantization methods, either post-training with calibration or quantization-aware training with fine-tuning, require original data for better performance, which may not be available due to confidentiality or privacy constraints. This lack of data can lead to a significant decline in performance. In this paper, we propose a universal and effective method called Generative Data Free Model Quantization with Knowledge Matching for Classification (KMDFQ) that removes the dependence on data for neural network quantization. To achieve this, we propose a knowledge matching generator that produces meaningful fake data based on the latent knowledge in the pre-trained model, including classification boundary knowledge and data distribution information. Based on this generator, we propose a fake-data driven data free quantization method that uses the generated data to take advantage of the latent knowledge for quantization. Furthermore, we introduce Mean Square Error alignment during the fine-tuning of the quantized model to more strictly and directly learn knowledge, making it more suitable for data free quantization. Extensive experiments on image classification demonstrate the effectiveness of our method, achieving higher accuracy than existing data free quantization methods, particularly as the quantization bit decreases. For example, on ImageNet, the 4-bit data free quantized ResNet-18 has less than a 1.2% accuracy decline compared to quantization with real data. The source code is available at https://github.com/ZSHsh98/KMDFQ.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call