Abstract

Quantisation is a commonly applied technique to improve the efficiency of a neural network on edge devices. Many applications also require that a machine learning model extend its learning cycle into the field. Training a quantised neural network on edge devices is a non-trivial task since the resource available on the edge is limited. Most of the quantisation-aware training methods maintain a quantised model, as well as an extra full-precision model, which is used to prevent large accuracy losses. The keep of the full precision model is based on the assumption that there are sufficient memory and energy supply to the machine for training. The assumption is contradictory to the situation of edge AI.In this paper, we propose the Adaptive Precision Training method (APT), which only keeps a quantised model. The challenge is that a quantised model has difficulty in learning, due to the quantisation underflow issue. APT employs a metric called Gavg to quantify the learning ability of each layer and dynamically adjusts per-layer bitwidth to ensure the model can learn effectively. Experiments on image classification and text classification tasks suggest that APT trains quantised models effectively with limited accuracy loss. Compared with the 8-bit traditional QAT method, APT saves 60-72.5% memory space for model parameters. We investigate the bitwidth necessary for effective training and gain preliminary insights into the relationship between architecture and its learning ability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call