Abstract
We review a class of methods that can be collected under the name nonlinear transform coding (NTC), which over the past few years have become competitive with the best linear transform codecs for images, and have superseded them in terms of rate-distortion performance under established perceptual quality metrics such as MS-SSIM. We assess the empirical rate-distortion performance of NTC with the help of simple example sources, for which the optimal performance of a vector quantizer is easier to estimate than with natural data sources. To this end, we introduce a novel variant of entropy-constrained vector quantization. We provide an analysis of various forms of stochastic optimization techniques for NTC models; review architectures of transforms based on artificial neural networks, as well as learned entropy models; and provide a direct comparison of a number of methods to parameterize the rate-distortion trade-off of nonlinear transforms, introducing a simplified one.
Highlights
There is no end in sight for the world’s reliance on multimedia communication
This paper reviews some of the recent developments in datadriven lossy compression; in particular, we focus on a class of methods that can be collectively called nonlinear transform coding (NTC), providing insights into its capabilities and challenges
In linear transform coding with a Gaussian source assumption, the probabilistic model P in eq (9) is typically considered to be a distribution factorized over each latent dimension, since the Karhunen–Loève Transform (KLT) factorizes the source
Summary
There is no end in sight for the world’s reliance on multimedia communication. Digital devices have been increasingly permeating our daily lives, and with them comes the need to store, send, and receive images and audio ever more efficiently. Transform coding (TC) has been the method of choice for compressing this type of data source. It turns out that in combination with stochastic optimization methods, such as stochastic gradient descent (SGD), and massively parallel computational hardware, a nearly universal set of tools for function approximation has emerged. These tools have been used in the context of data compression [4]–[9]. This paper reviews some of the recent developments in datadriven lossy compression; in particular, we focus on a class of methods that can be collectively called nonlinear transform coding (NTC), providing insights into its capabilities and challenges. The last two sections discuss connections to related work and conclude the paper, respectively
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have