Model Compression via Position-Based Scaled Gradient

Jangho Kim,Nojun Kwak,Kiyoon Yoo

doi:10.1109/access.2022.3231455

Jangho Kim, Nojun Kwak + Show 1 more

Open Access

https://doi.org/10.1109/access.2022.3231455

Copy DOI

Abstract

We propose the position-based scaled gradient (PSG) that scales the gradient depending on the position of a weight vector to make it more compression-friendly. First, we theoretically show that applying PSG to the standard gradient descent (GD), which is called PSGD, is equivalent to the GD in the warped weight space, a space made by warping the original weight space via an appropriately designed invertible function. Second, we empirically show that PSG acting as a regularizer to the weight vectors is favorable for model compression domains such as quantization, pruning, and knowledge distillation. PSG reduces the gap between the weight distributions of a full-precision model and its compressed counterpart. This enables the versatile deployment of a model either as an uncompressed mode or as a compressed mode depending on the availability of resources. The experimental results on CIFAR-10/100 and ImageNet datasets show the effectiveness of the proposed PSG in model compression including an iterative pruning method and the knowledge distillation.

Full Text