Abstract

Quantization of weights and activations has been introduced into optimization methodologies of deep neural networks (DNNs) to address issues like memory consumption and massive computational demands that impede edge applications of neural networks. There are multiple approaches to quantization which can probably all be classified into two categories: uniform quantization and non-uniform quantization. The key difference between these methods is whether unequal quantization intervals are used to match the non-uniform distribution of weights. However, conventional techniques train weights layer by layer to suit previously determined quantization points, which poses difficulty for reaching optimal points. We proposed a framework called Deep Projection (DP) to train quantized weights with adaptive gradients. Gradients and weights are projected to high dimensional training space through projection layers, resulting in more complicated update paths and randomness, which benefit the generalization of models. Renewal of weights in the network is conducted by values composited by self-updating high dimensional tensors, which means that the resulting gradients are softer and more adaptive to network training. We applied our method via a uniform quantization approach, and the results showed improvements even when we limited the first and last layers with lower bit-width. The results could be comparable with the non-uniform quantization method with 4-bit precision. Distributions of weights in different quantized networks are analyzed to display the advantages of our method. Besides, all of these steps can be realized in a conventional training process. There are no limitations to quantization bit-width and function. Namely, our framework is readily implemented with a concise architecture.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call