Automated Quantization and Retraining for Neural Network Models Without Labeled Data

Kundjanasith Thonglek,Hajimu Iida,Keichi Takahashi,Pattara Leelaprute,Hidemoto Nakada,Chawanat Nakasan,Ryousei Takano,Kohei Ichikawa

doi:10.1109/access.2022.3190627

Abstract

Deploying neural network models to edge devices is becoming increasingly popular because such deployment decreases the response time and ensures better data privacy of services. However, running large models on edge devices poses challenges because of limited computing resources and storage space. Researchers have therefore proposed various model compression methods to reduce the model size. To balance the trade-off between model size and accuracy, conventional model compression methods require manual effort to find the optimal configuration that reduces the model size without significant degradation of accuracy. In this article, we propose a method to automatically find the optimal configurations for quantization. The proposed method suggests multiple compression configurations that produce models with different size and accuracy, from which users can select the configurations that suit their use cases. Additionally, we propose a retraining method that does not require any labeled datasets for retraining. We evaluated the proposed method using various neural network models for classification, regression and semantic similarity tasks, and demonstrated that the proposed method reduced the size of models by at least 30% while maintaining less than 1% loss of accuracy.We compared the proposed method with state-of-the-art automated compression methods, and showed that it can provide better compression configurations than existing methods.

Full Text