Gradient distribution-aware INT8 training for neural networks

Shuai Wang,Yi Kang

doi:10.1016/j.neucom.2023.126269

Abstract

Recently, low bit-width quantization (e.g., INT8) has been commonly used in deep neural network inference acceleration, but fewer researchers have focused on low-precision training quantization techniques. Considering that the backward propagation in deep neural network training is more computationally intensive and has a heavier energy overhead than the inference process, the quantization of backward propagation is of great interest for the training of very large-scale neural networks as well as for low-power devices with online training requirements. However, the shape specificity and continuous variability of the gradient distribution make gradient quantization difficult, and many studies propose various complex quantization methods for the gradient to reduce training accuracy loss. In this paper, we propose two innovative techniques mainly for INT8 quantization training, including the Data-aware Dynamic Segmentation Quantization scheme to quantize various special gradient distributions and the Update Direction Periodic Search strategy to achieve lower quantization errors. Then, we build a distribution-aware INT8 quantization training framework based on these two methods and conduct experiments on various models and tasks. Experimental results show that our proposed INT8 quantization training method achieves a negligible loss in final training accuracy compared to the full-precision floating-point counterpart on different models, including ResNet, MobileNetV2, VGG, AlexNet, and LSTM. By replacing floating-point computing with 8-bit integer computing for network training, this INT8 quantization training framework provides the possibility of deploying online training directly on low-power devices in the future.

Full Text