Abstract

Neural Network Coding and Representation (NNR) is the first international standard for efficient compression of neural networks (NNs). The standard is designed as a toolbox of compression methods, which can be used to create coding pipelines. It can be either used as an independent coding framework (with its own bitstream format) or together with external neural network formats and frameworks. For providing the highest degree of flexibility, the network compression methods operate per parameter tensor in order to always ensure proper decoding, even if no structure information is provided. The NNR standard contains compression-efficient quantization and deep context-adaptive binary arithmetic coding (DeepCABAC) as core encoding and decoding technologies, as well as neural network parameter pre-processing methods like sparsification, pruning, low-rank decomposition, unification, local scaling and batch norm folding. NNR achieves a compression efficiency of more than 97% for transparent coding cases, i.e. without degrading classification quality, such as top-1 or top-5 accuracies. This paper provides an overview of the technical features and characteristics of NNR.

Highlights

  • T HE novel standard for Neural Network Compression and Representation (NNR), or part 17 of ISO/IEC 15938 [1], is the first standard by the ISO/IEC Moving Picture Experts Group (MPEG) that targets the efficient compression and transmission of neural networks (NNs)

  • In order to improve interoperability, two exchange formats have been proposed: (i) Open Neural Network Exchange Format ONNX [6], with a serialized format based on protobuf5, and strings identifying types of elements in the graph, and widely supported as import/export format by different frameworks. (ii) Neural Network Exchange Format (NNEF) [7], which is an effort by the Khronos group to define an exchange format to use networks trained with different frameworks for inference on different platforms

  • In order to allow for parallel decoding of large tensors, a block scanning and entry point concept is included in the NNR standard

Read more

Summary

INTRODUCTION

T HE novel standard for Neural Network Compression and Representation (NNR), or part 17 of ISO/IEC 15938 [1], is the first standard by the ISO/IEC Moving Picture Experts Group (MPEG) that targets the efficient compression and transmission of neural networks (NNs). The NNR standard provides a compression efficiency of up to 97% for transparent coding use cases, i.e. without degrading the classification and inference capability of the respective NN. This is reflected by the obtained evaluation results, where compression efficiency in terms of compressed bitrate vs original NN bitrate is analyzed. In order to address these requirements, the NNR standard is designed to provide the highest compression efficiency for deep neural networks by combining preprocessing methods for data reduction, quantization, and context-adaptive binary arithmetic coding (DeepCABAC).

RELATED WORK
NNR CODING TOOLS AND FEATURES
Coding Pipelines
Interoperability with Exchange Formats
Decoding Methods
Parallel Decoding
HIGH-LEVEL SYNTAX
Sparsification
Pruning
Low-Rank Decomposition
Unification
Batch Norm Folding
Local Scaling
QUANTIZATION
Uniform Nearest Neighbor Quantization
ENTROPY CODING
Binarization
Context modeling
Arithmetic coding
VIII. COMPRESSION PERFORMANCE
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call