Quantization is a critical technique employed across various research fields for compressing deep neural networks (DNNs) to facilitate deployment within resource-limited environments. This process necessitates a delicate balance between model size and performance. In this work, we explore knowledge distillation (KD) as a promising approach for improving quantization performance by transferring knowledge from high-precision networks to low-precision counterparts. We specifically investigate feature-level information loss during distillation and emphasize the importance of feature-level network quantization perception. We propose a novel quantization method that combines feature-level distillation and contrastive learning to extract and preserve more valuable information during the quantization process. Furthermore, we utilize the hyperbolic tangent function to estimate gradients with respect to the rounding function, which smoothens the training procedure. Our extensive experimental results demonstrate that the proposed approach achieves competitive model performance with the quantized network compared to its full-precision counterpart, thus validating its efficacy and potential for real-world applications.
Read full abstract