Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks

Yu-Shan Tai,An-Yeu Wu,Yi-Ta Chen,Cheng-Yang Chang,Chieh-Fang Teng

doi:10.1109/tcad.2023.3248503

Abstract

Recently, deep convolutional neural networks (CNNs) have achieved eye-catching results in various applications. However, intensive memory access of activations introduces considerable energy consumption, resulting in a great challenge for deploying CNNs on resource-constrained edge devices. Existing research utilizes dimension reduction and mixed-precision quantization separately to reduce computational complexity without paying attention to their interaction. Such naïve concatenation of different compression strategies ends up with sub-optimal performance. To develop a comprehensive compression framework, we propose an optimization system by jointly considering dimension reduction and mixed-precision quantization, which is enabled by independent group-wise learnable mixed-precision schemes. Group partitioning is guided by a well-designed automatic group partition mechanism that can distinguish compression priorities among channels, and it can deal with the trade-off between model accuracy and compressibility. Moreover, to preserve model accuracy under low bit-width quantization, we propose a dynamic bit-width searching technique to enable continuous bit-width reduction. Our experimental results show that the proposed system reaches 69.03%/70.73% with average 2.16/2.61 bits per value on Resnet18/MobileNetV2, while introducing only approximately 1% accuracy loss of the uncompressed full-precision models. Compared with individual activation compression schemes, the proposed joint optimization system reduces 55%/9% (-2.62/-0.27 bits) memory access of dimension reduction and 55%/63% (-2.60/-4.52 bits) memory access of mixed-precision quantization, respectively, on Resnet18/ MobileNetV2 with comparable or even higher accuracy.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Nov 1, 2023
Citations: 3

Similar Papers

Exploiting Network Science for Feature Extraction and Representation Learning

-

15 Oct 2019
15 Oct 2019

Human Activity Recognition on Microcontrollers with Quantized and Adaptive Deep Neural Networks
Francesco Daghero ... Enrico Macii
ACM Transactions on Embedded Computing Systems | VOL. 21
Francesco Daghero, et. al.Francesco Daghero ... Enrico Macii
31 Jul 2022
ACM Transactions on Embedded Computing Systems | VOL. 21

Arithmetic Coding-Based 5-Bit Weight Encoding and Hardware Decoder for CNN Inference in Edge Devices
Jong Hun Lee ... Arslan Munir
IEEE Access | VOL. 9
Jong Hun Lee, et. al.Jong Hun Lee ... Arslan Munir
01 Jan 2020
IEEE Access | VOL. 9

Compression-Aware Projection with Greedy Dimension Reduction for Convolutional Neural Network Activations
Yu-Shan Tai ... Chieh-Fang Teng
-
Yu-Shan Tai, et. al.Yu-Shan Tai ... Chieh-Fang Teng
23 May 2022
23 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Joint Optimization of Dimension Reduction and Mixed-Precision Quantization for Activation Compression of Neural Networks

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems