Arithmetic Coding-Based 5-Bit Weight Encoding and Hardware Decoder for CNN Inference in Edge Devices

Jong Hun Lee,Arslan Munir,Joonho Kong

doi:10.1109/access.2021.3136888

Abstract

Convolutional neural networks (CNNs) have gained a huge attention for real-world artificial intelligence (AI) applications such as image classification and object detection. On the other hand, for better accuracy, the size of the CNNs’ parameters (weights) has been increasing, which in turn makes it difficult to enable on-device CNN inferences in resource-constrained edge devices. Though weight pruning and 5-bit quantization methods have shown promising results, it is still challenging to deploy large CNN models in edge devices. In this paper, we propose an encoding and hardware-based decoding technique which can be applied to 5-bit quantized weight data for on-device CNN inferences in resource-constrained edge devices. Given 5-bit quantized weight data, we employ arithmetic coding with range scaling for lossless weight compression, which is performed offline. When executing on-device inferences with underlying CNN accelerators, our hardware decoder enables a fast in-situ weight decompression with small latency overhead. According to our evaluation results with five widely used CNN models, our arithmetic coding-based encoding method applied to 5-bit quantized weights shows a better compression ratio by 9.6× while also reducing the memory data transfer energy consumption by 89.2%, on average, as compared to the case of uncompressed 32-bit floating-point weights. When applying our technique to pruned weights, we obtain better compression ratios by 57.5×–112.2× while reducing energy consumption by 98.3%–99.1% as compared to the case of 32-bit floating-point weights. In addition, by pipelining the weight decoding and transfer with the CNN execution, the latency overhead of our weight decoding with 16 decoding unit (DU) hardware is only 0.16%–5.48% and 0.16%–0.91% for non-pruned and pruned weights, respectively. Moreover, our proposed technique with 4-DU decoder hardware reduces system-level energy consumption by 1.1%–9.3%.

Highlights

Convolutional neural networks (CNNs) have been widely deployed in many artificial intelligence (AI) applications
For an in-situ weight decompression for edge devices which contain a convolutional neural networks (CNNs) accelerator or NPU, we propose a hardware decoder which can decompress the compressed weight with a small latency overhead
Considering that the main focus of our technique is resource-constrained edge devices, this small latency overhead is sufficiently acceptable as the benefits from the reduced memory and storage requirement and reduced memory energy consumption are much greater than the latency overhead

Summary

INTRODUCTION

Convolutional neural networks (CNNs) have been widely deployed in many artificial intelligence (AI) applications. As a more aggressive solution, several works have proposed to use 5-bit weight elements for deploying the CNN models in resource-constrained systems [5] [6]. Though, these works have shown successful results on reducing the weight data size, we could further reduce the weight size by applying the data encoding schemes such as Huffman coding or arithmetic coding. By only storing the encoded (the reduced size of) weight data in device’s memory and/or storage, we could enable more cost-efficient deployment of the CNN models in resource-constrained devices. We introduce an arithmetic coding-based 5bit quantized weight compression technique for on-device CNN inferences in resource-constrained edge devices.

RELATED WORK

BACKGROUND

ENTROPY-BASED CODING

ARITHMETIC CODING-BASED WEIGHT ENCODING AND DECODING WITH RANGE SCALING

Append N-1 ‘0’s at the end of BS

EVALUATION RESULTS

LATENCY OVERHEAD

LATENCY VERSUS RESOURCE USAGE TRADE-OFF

CONCLUSIONS

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Arithmetic Coding-Based 5-Bit Weight Encoding and Hardware Decoder for CNN Inference in Edge Devices

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

CPU-Accelerator Co-Scheduling for CNN Acceleration at the Edge
Yeongmin Kim ... Arslan Munir
IEEE access : practical innovations, open solutions | VOL. 8
Yeongmin Kim, et. al.Yeongmin Kim ... Arslan Munir
01 Jan 2020
IEEE access : practical innovations, open solutions | VOL. 8

TinyHAR: Benchmarking Human Activity Recognition Systems in Resource Constrained Devices
Sheikh Nooruddin ... Fakhri Karray
-
Sheikh Nooruddin, et. al.Sheikh Nooruddin ... Fakhri Karray
26 Oct 2022
26 Oct 2022

Automated Exploration and Implementation of Distributed CNN Inference at the Edge
Xiaotian Guo ... Todor Stefanov
IEEE Internet of Things Journal | VOL. 10
Xiaotian Guo, et. al.Xiaotian Guo ... Todor Stefanov
01 Apr 2023
IEEE Internet of Things Journal | VOL. 10

A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing
Mário P Véstias
Algorithms | VOL. 12
Mário P VéstiasMário P Véstias
31 Jul 2019
Algorithms | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Arithmetic Coding-Based 5-Bit Weight Encoding and Hardware Decoder for CNN Inference in Edge Devices

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions