Feature Map-Aware Activation Quantization for Low-bit Neural Networks

Seungjin Lee,Hyun Kim

doi:10.1109/itc-cscc52171.2021.9501414

Abstract

Quantization, the most popular deep neural network (DNN) compression method, can reduce the computational complexity and save a lot of memory resources by converting the existing 32-bit floating point values to low-bit integer point values. However, as DNNs are widely used in mobile and edge devices, which have relatively less hardware resources, there are demands for more aggressive quantization methods. To meet these needs, this paper introduces a dedicated method that divides activation maps of DNNs into several regions according to the activation size and quantizes them to 4-bit by setting scale factors adaptively for each region. As a result of applying the proposed method to the backbone of YOLACT, a representative instance segmentation model, the proposed method achieves approximately 2% increase in both box and mask mAPs compared to the naive 4-bit activation quantization.

Full Text