Overview of the Neural Network Compression and Representation (NNR) Standard

Heiner Kirchhoffer,Shan Liu,Miska M Hannuksela,Swayambhoo Jain,Karsten Muller,Hamed Rezazadegan-Tavakoli,Paul Haase,Wei Wang,Wei Jiang,Werner Bailer,Fabien Racape,Wojciech Samek,Francesco Cricri,Shahab Hamidi-Rad,Emre B Aksu

doi:10.1109/tcsvt.2021.3095970

Abstract

Neural Network Coding and Representation (NNR) is the first international standard for efficient compression of neural networks (NNs). The standard is designed as a toolbox of compression methods, which can be used to create coding pipelines. It can be either used as an independent coding framework (with its own bitstream format) or together with external neural network formats and frameworks. For providing the highest degree of flexibility, the network compression methods operate per parameter tensor in order to always ensure proper decoding, even if no structure information is provided. The NNR standard contains compression-efficient quantization and deep context-adaptive binary arithmetic coding (DeepCABAC) as core encoding and decoding technologies, as well as neural network parameter pre-processing methods like sparsification, pruning, low-rank decomposition, unification, local scaling and batch norm folding. NNR achieves a compression efficiency of more than 97% for transparent coding cases, i.e. without degrading classification quality, such as top-1 or top-5 accuracies. This paper provides an overview of the technical features and characteristics of NNR.

Highlights

T HE novel standard for Neural Network Compression and Representation (NNR), or part 17 of ISO/IEC 15938 [1], is the first standard by the ISO/IEC Moving Picture Experts Group (MPEG) that targets the efficient compression and transmission of neural networks (NNs)
In order to improve interoperability, two exchange formats have been proposed: (i) Open Neural Network Exchange Format ONNX [6], with a serialized format based on protobuf5, and strings identifying types of elements in the graph, and widely supported as import/export format by different frameworks. (ii) Neural Network Exchange Format (NNEF) [7], which is an effort by the Khronos group to define an exchange format to use networks trained with different frameworks for inference on different platforms
In order to allow for parallel decoding of large tensors, a block scanning and entry point concept is included in the NNR standard

Summary

INTRODUCTION

T HE novel standard for Neural Network Compression and Representation (NNR), or part 17 of ISO/IEC 15938 [1], is the first standard by the ISO/IEC Moving Picture Experts Group (MPEG) that targets the efficient compression and transmission of neural networks (NNs). The NNR standard provides a compression efficiency of up to 97% for transparent coding use cases, i.e. without degrading the classification and inference capability of the respective NN. This is reflected by the obtained evaluation results, where compression efficiency in terms of compressed bitrate vs original NN bitrate is analyzed. In order to address these requirements, the NNR standard is designed to provide the highest compression efficiency for deep neural networks by combining preprocessing methods for data reduction, quantization, and context-adaptive binary arithmetic coding (DeepCABAC).

RELATED WORK

NNR CODING TOOLS AND FEATURES

Coding Pipelines

Interoperability with Exchange Formats

Decoding Methods

Parallel Decoding

HIGH-LEVEL SYNTAX

Sparsification

Pruning

Low-Rank Decomposition

Unification

Batch Norm Folding

Local Scaling

QUANTIZATION

Uniform Nearest Neighbor Quantization

ENTROPY CODING

Binarization

Context modeling

Arithmetic coding

VIII. COMPRESSION PERFORMANCE

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Circuits and Systems for Video Technology	Publication Date: May 1, 2022
Citations: 27	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Overview of the Neural Network Compression and Representation (NNR) Standard

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology

Lead the way for us

Similar Papers

Transformability, generalizability, but limited diffusibility: Comparing global vs. task-specific language representations in deep neural networks
Yanru Jiang ... Hongjing Lu
Cognitive Systems Research | VOL. 83
Yanru Jiang, et. al.Yanru Jiang ... Hongjing Lu
07 Nov 2023
Cognitive Systems Research | VOL. 83

Critical Challenges for the Visual Representation of Deep Neural Networks
Kieran Browne ... Ben Swift
-
Kieran Browne, et. al.Kieran Browne ... Ben Swift
01 Jan 2018
01 Jan 2018

Hyper-heuristic Evolution of Dispatching Rules: A Comparison of Rule Representations.
Jürgen Branke ... Bernd Scholz-Reiter
Evolutionary computation | VOL. 23
Jürgen Branke, et. al.Jürgen Branke ... Bernd Scholz-Reiter
24 Nov 2014
Evolutionary computation | VOL. 23

Neural Network Representation of Tensor Network and Chiral States.
Joel E Moore ... Yichen Huang
Physical review letters | VOL. 127
Joel E Moore, et. al.Joel E Moore ... Yichen Huang
18 Oct 2021
Physical review letters | VOL. 127

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Overview of the Neural Network Compression and Representation (NNR) Standard

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology