Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Learned Progressive Image Compression With Dead-Zone Quantizers

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Progressive coding is essential to the practical deployment of learned image compression over heterogeneous networks and clients. Existing methods for learned progressive image compression require complex and empirical design to achieve near-optimal rate-distortion performance over a wide range of bit-rates. However, these methods are limited by the implicit learned mechanism based on neural networks and introduction of uniform quantizers. In this paper, we propose generalized learned progressive image compression with analytic rate-distortion optimization using dead-zone quantizers on the latent representation. Specifically, we reveal that dead-zone quantizers, as a general case of uniform quantizers, are equivalent to uniform quantizers in fixed-rate nonlinear transform coding and can prevent extra redundancy in embedded quantization for progressive coding. Consequently, we propose rate-distortion optimized learned progressive coding by approximating the optimal quantizer in the source spaces using dead-zone quantizers in an analytic manner on the Laplacian source. To our best knowledge, this paper is the first to achieve general learned progressive coding from the perspective of optimal quantizers. The proposed method achieves theoretically sound and practically efficient embedded quantization and learned progressive coding of latent representations with improved rate-distortion performance. It can also enable embedded quantization with diverse assignments of truncation points and support flexible configuration of quality layers of varying numbers and at varying target bit-rates. Furthermore, we successfully incorporate the proposed method into existing pre-trained fixed-rate models to realize progressive learned image compression without re-training. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performance in learned progressive image compression compared with traditional codecs and recent learned methods.

Similar Papers
  • Research Article
  • Cite Count Icon 66
  • 10.1109/tcsvt.2021.3119660
Learned Block-Based Hybrid Image Compression
  • Jun 1, 2022
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Yaojun Wu + 4 more

Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications. First, parallel acceleration of the autoregressive entropy model cannot be achieved due to serial decoding. Second, full-resolution inference often causes the out-of-memory (OOM) problem with limited GPU resources, especially for high-resolution images. Block partition is a good choice to handle the above issues, but it brings about new challenges in reducing the redundancy between blocks and eliminating block effects. To tackle the above challenges, this paper provides a learned block-based hybrid image compression (LBHIC) framework. Specifically, we introduce explicit intra prediction into a learned image compression framework to utilize the relation among adjacent blocks. Superior to context modeling by linear weighting of neighbor pixels in traditional codecs, we propose a contextual prediction module (CPM) to better capture long-range correlations by utilizing the strip pooling to extract the most relevant information in neighboring latent space, thus achieving effective information prediction. Moreover, to alleviate blocking artifacts, we further propose a boundary-aware postprocessing module (BPM) with the edge importance taken into account. Extensive experiments demonstrate that the proposed LBHIC codec outperforms the VVC, with a bit-rate conservation of 4.1%, and reduces the decoding time by approximately 86.7% compared with that of state-of-the-art learned image compression methods.

  • Research Article
  • 10.63328/ijcser-v1ri3p7
Compression of Depper Images for Hybrid Contexts of Picture Order and Recreation
  • Jul 30, 2024
  • International Journal of Computational Science and Engineering Research
  • Rama Kumar N + 4 more

Progressive deep image compression is a method for compressing digital images using deep learning techniques. It is an extension of traditional image compression methods, such as JPEG and PNG, which use a combination of mathematical algorithms to compress images. In progressive deep image compression, a deep neural network is trained to learn how to compress images in a way that preserves image quality while reducing file size. The network is trained in a progressive manner, where the compression quality is gradually increased as the network is trained on more data. The main advantage of progressive deep image compression is that it can achieve higher compression ratios while maintaining image quality compared to traditional methods. This is because the neural network can learn to identify and preserve the most important features of an image while discarding less important information. The use of deep learning in image compression is a rapidly evolving area of research, with many new techniques and algorithms being developed. Progressive deep image compression is one such technique that shows promise in improving the efficiency of image compression for a wide range of applications. Results show the superiority over the existing approaches and performance metrics supports the proposed model in progressive image compression.

  • Research Article
  • Cite Count Icon 8
  • 10.1109/tcsvt.2024.3401872
NLIC: Non-Uniform Quantization-Based Learned Image Compression
  • Oct 1, 2024
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Ziqing Ge + 4 more

In recent years, Learned Image Compression (LIC) has undergone rapid evolution. However, it is worthy noting that most prevalent LIC methodologies still rely on uniform Scalar Quantization (SQ) for latent features. This overlooks the untapped potential of contextual information, which could be leveraged to significantly reduce statistical redundancies. Prior researches have explored Vector Quantization (VQ)’s adaptability to diverse data distributions, yet it introduces significant computational complexity into LIC, hindering its practical implementation. Consequently, in this work, we propose the Contextual Sequential Quantization (CSQ) method, which progressively discretizes the latent features of LIC by harnessing content contextual information and image textural priors. Our proposed CSQ signifies progress in LIC by blending the computational efficiency of SQ with a substantial approach towards the adaptability of VQ. We further propose the Center Compensation Module (CCM) based on the proposed CSQ. This module strategically determines adaptive quantization centers, leading to a direct enhancement of reconstruction quality without compromising the bit-rate. Moreover, it is worth noticing that existing LIC approaches face challenges in leveraging hyper side information to effectively enhance transformations, which is attributed to the entanglement of the hyperprior generation module with the main transformations. Consequently, we propose to decouple the hyperprior module from main transformations, and design the Hyperprior-Assisted Transformation (HAT) unit to feed hyperprior back into main transformations. This further improves the coding performance. By integrating all together the proposed CSQ, CCM, and HAT, our proposed Non-uniform quantization-based LIC (NLIC) method attains state-of-the-art rate-distortion (R-D) performance among existing LIC methodologies.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icip40778.2020.9190974
Shrinkage as Activation for Learned Image Compression
  • Oct 1, 2020
  • Ogun Kirmemis + 1 more

With recent advances in learned entropy and context models, the rate-distortion performance of deep learned image compression methods reached or surpassed those of conventional codecs. However, learned image compression is currently more complex and slower than conventional image compression. Learned image and video compression methods almost exclusively employ the generalized divisive normalization (GDN) activation function. This paper investigates the effect of activation function on the performance of image compression in terms of both objective and subjective criteria as well as runtime. In particular, we show that the distribution of latents produced by hard shrinkage fits a Laplacian better, and it is possible to achieve similar rate-distortion and better visual performance using hard shrinkage with lower complexity.

  • Research Article
  • 10.1609/aaai.v39i10.33100
Few-Shot Domain Adaptation for Learned Image Compression
  • Apr 11, 2025
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Tianyu Zhang + 4 more

Learned image compression (LIC) has achieved state-of-the-art rate-distortion performance, deemed promising for next-generation image compression techniques. However, pre-trained LIC models usually suffer from significant performance degradation when applied to out-of-training-domain images, implying their poor generalization capabilities. To tackle this problem, we propose a few-shot domain adaptation method for LIC by integrating plug-and-play adapters into pre-trained models. Drawing inspiration from the analogy between latent channels and frequency components, we examine domain gaps in LIC and observe that out-of-training-domain images disrupt pre-trained channel-wise decomposition. Consequently, we introduce a method for channel-wise re-allocation using convolution-based adapters and low-rank adapters, which are lightweight and compatible to mainstream LIC schemes. Extensive experiments across multiple domains and multiple representative LIC schemes demonstrate that our method significantly enhances pre-trained models, achieving comparable performance to H.266/VVC intra coding with merely 25 target-domain samples. Additionally, our method matches the performance of full-model finetune while transmitting fewer than 2% of the parameters.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/pcs50896.2021.9477479
A Practical Approach for Rate-Distortion-Perception Analysis in Learned Image Compression
  • Jun 1, 2021
  • Ogun Kirmemis + 1 more

Rate-distortion optimization (RDO) of codecs, where distortion is quantified by the mean-square error, has been a standard practice in image/video compression over the years. RDO serves well for optimization of codec performance for evaluation of the results in terms of PSNR. However, it is well known that the PSNR does not correlate well with perceptual evaluation of images; hence, RDO is not well suited for perceptual optimization of codecs. Recently, rate-distortion-perception trade-off has been formalized by taking the Kullback-Leibler (KL) divergence between the distributions of the original and reconstructed images as a perception measure. Learned image compression methods that simultaneously optimize rate, mean-square loss, VGG loss, and an adversarial loss were proposed. Yet, there exists no easy approach to fix the rate, distortion or perception at a desired level in a practical learned image compression solution to perform an analysis of the trade-off between rate, distortion and perception measures. In this paper, we propose a practical approach to fix the rate to carry out perception-distortion analysis at a fixed rate in order to perform perceptual evaluation of image compression results in a principled manner. Experimental results provide several insights for practical rate-distortion-perception analysis in learned image compression.

  • Conference Article
  • Cite Count Icon 112
  • 10.1109/cvpr.2019.01031
Learning Image and Video Compression Through Spatial-Temporal Energy Compaction
  • Jun 1, 2019
  • Zhengxue Cheng + 3 more

Compression has been an important research topic for many decades, to produce a significant impact on data transmission and storage. Recent advances have shown a great potential of learning image and video compression. Inspired from related works, in this paper, we present an image compression architecture using a convolutional autoencoder, and then generalize image compression to video compression, by adding an interpolation loop into both encoder and decoder sides. Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. Thereby, we propose to add a spatial energy compaction-based penalty into loss function, to achieve higher image compression performance. Furthermore, based on temporal energy distribution, we propose to select the number of frames in one interpolation loop, adapting to the motion characteristics of video contents. Experimental results demonstrate that our proposed image compression outperforms the latest image compression standard with MS-SSIM quality metric, and provides higher performance compared with state-of-the-art learning compression methods at high bit rates, which benefits from our spatial energy compaction approach. Meanwhile, our proposed video compression approach with temporal energy compaction can significantly outperform MPEG-4 and is competitive with commonly used H.264. Both our image and video compression can produce more visually pleasant results than traditional standards.

  • Research Article
  • 10.1109/tcsvt.2024.3522621
Sparse Point Clouds Assisted Learned Image Compression
  • May 1, 2025
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Yiheng Jiang + 4 more

In the field of autonomous driving, a variety of sensor data types exist, each representing different modalities of the same scene. Therefore, it is feasible to utilize data from other sensors to facilitate image compression. However, few techniques have explored the potential benefits of utilizing inter-modality correlations to enhance the image compression performance. In this paper, motivated by the recent success of learned image compression, we propose a new framework that uses sparse point clouds to assist in learned image compression in the autonomous driving scenario. We first project the 3D sparse point cloud onto a 2D plane, resulting in a sparse depth map. Utilizing this depth map, we proceed to predict camera images. Subsequently, we use these predicted images to extract multi-scale structural features. These features are then incorporated into learned image compression pipeline as additional information to improve the compression performance. Our proposed framework is compatible with various mainstream learned image compression models, and we validate our approach using different existing image compression methods. The experimental results show that incorporating point cloud assistance into the compression pipeline consistently enhances the performance.

  • Book Chapter
  • Cite Count Icon 2
  • 10.1007/978-3-031-19839-7_16
A Cloud 3D Dataset and Application-Specific Learned Image Compression in Cloud 3D
  • Jan 1, 2022
  • Tianyi Liu + 3 more

In Cloud 3D, such as Cloud Gaming and Cloud Virtual Reality (VR), image frames are rendered and compressed (encoded) in the cloud, and sent to the clients for users to view. For low latency and high image quality, fast, high compression rate, and high-quality image compression techniques are preferable. This paper explores computation time reduction techniques for learned image compression to make it more suitable for cloud 3D. More specifically, we employed slim (low-complexity) and application-specific AI models to reduce the computation time without degrading image quality. Our approach is based on two key insights: (1) as the frames generated by a 3D application are highly homogeneous, application-specific compression models can improve the rate-distortion performance over a general model; (2) many computer-generated frames from 3D applications are less complex than natural photos, which makes it feasible to reduce the model complexity to accelerate compression computation. We evaluated our models on six gaming image datasets. The results show that our approach has similar rate-distortion performance as a state-of-the-art learned image compression algorithm, while obtaining about 5x to 9x speedup and reducing the compression time to be less than 1 s (0.74s), bringing learned image compression closer to being viable for cloud 3D. Code is available at https://github.com/cloud-graphics-rendering/AppSpecificLIC.KeywordsCloud gamingCloud virtual realityLearned image compressionModel simplificationApplication-specific modelingModel-task balance

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.neucom.2022.07.065
Successive learned image compression: Comprehensive analysis of instability
  • Jul 22, 2022
  • Neurocomputing
  • Jun-Hyuk Kim + 3 more

Successive learned image compression: Comprehensive analysis of instability

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 20
  • 10.3390/rs15082211
Remote Sensing Image Compression Based on the Multiple Prior Information
  • Apr 21, 2023
  • Remote Sensing
  • Chuan Fu + 1 more

Learned image compression has achieved a series of breakthroughs for nature images, but there is little literature focusing on high-resolution remote sensing image (HRRSI) datasets. This paper focuses on designing a learned lossy image compression framework for compressing HRRSIs. Considering the local and non-local redundancy contained in HRRSI, a mixed hyperprior network is designed to explore both the local and non-local redundancy in order to improve the accuracy of entropy estimation. In detail, a transformer-based hyperprior and a CNN-based hyperprior are fused for entropy estimation. Furthermore, to reduce the mismatch between training and testing, a three-stage training strategy is introduced to refine the network. In this training strategy, the entire network is first trained, and then some sub-networks are fixed while the others are trained. To evaluate the effectiveness of the proposed compression algorithm, the experiments are conducted on an HRRSI dataset. The results show that the proposed algorithm achieves comparable or better compression performance than some traditional and learned image compression algorithms, such as Joint Photographic Experts Group (JPEG) and JPEG2000. At a similar or lower bitrate, the proposed algorithm is about 2 dB higher than the PSNR value of JPEG2000.

  • Research Article
  • Cite Count Icon 22
  • 10.1016/j.sigpro.2022.108778
Learned image compression with generalized octave convolution and cross-resolution parameter estimation
  • Sep 12, 2022
  • Signal Processing
  • Haisheng Fu + 1 more

Learned image compression with generalized octave convolution and cross-resolution parameter estimation

  • Research Article
  • Cite Count Icon 159
  • 10.1109/tcsvt.2021.3089491
Causal Contextual Prediction for Learned Image Compression
  • Apr 1, 2022
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Zongyu Guo + 3 more

Over the past several years, we have witnessed impressive progress in the field of learned image compression. Recent learned image codecs are commonly based on autoencoders, that first encode an image into low-dimensional latent representations and then decode them for reconstruction purposes. To capture spatial dependencies in the latent space, prior works exploit hyperprior and spatial context model to build an entropy model, which estimates the bit-rate for end-to-end rate-distortion optimization. However, such an entropy model is suboptimal from two aspects: (1) It fails to capture spatially global correlations among the latents. (2) Cross-channel relationships of the latents are still underexplored. In this paper, we propose the concept of separate entropy coding to leverage a serial decoding process for causal contextual entropy prediction in the latent space. A causal context model is proposed that separates the latents across channels and makes use of cross-channel relationships to generate highly informative contexts. Furthermore, we propose a causal global prediction model, which is able to find global reference points for accurate predictions of unknown points. Both these two models facilitate entropy estimation without the transmission of overhead. In addition, we further adopt a new separate attention module to build more powerful transform networks. Experimental results demonstrate that our full image compression model outperforms standard VVC/H.266 codec on Kodak dataset in terms of both PSNR and MS-SSIM, yielding the state-of-the-art rate-distortion performance.

  • Research Article
  • 10.3390/jimaging12010012
Patched-Based Swin Transformer Hyperprior for Learned Image Compression
  • Dec 26, 2025
  • Journal of Imaging
  • Sibusiso B Buthelezi + 1 more

We present a hybrid end-to-end learned image compression framework that combines a CNN-based variational autoencoder (VAE) with an efficient hierarchical Swin Transformer to address the limitations of existing entropy models in capturing global dependencies under computational constraints. Traditional VAE-based codecs typically rely on CNN-based priors with localized receptive fields, which are insufficient for modelling the complex, high-dimensional dependencies of the latent space, thereby limiting compression efficiency. While fully global transformer-based models can capture long-range dependencies, their high computational complexity makes them impractical for high-resolution image compression. To overcome this trade-off, our approach couples a CNN-based VAE with a patch-based hierarchical Swin Transformer hyperprior that employs shifted window self-attention to effectively model both local and global contextual information while maintaining computational efficiency. The proposed framework tightly integrates this expressive entropy model with an end-to-end differentiable quantization module, enabling joint optimization of the complete rate-distortion objective. By learning a more accurate probability distribution of the latent representation, the model achieves improved bitrate estimation and a more compact latent representation, resulting in enhanced compression performance. We validate our approach on the widely used Kodak, JPEG AI, and CLIC datasets, demonstrating that the proposed hybrid architecture achieves superior rate-distortion performance, delivering higher visual quality at lower bitrates compared to methods relying on simpler CNN-based entropy priors. This work demonstrates the effectiveness of integrating efficient transformer architectures into learned image compression and highlights their potential for advancing entropy modelling beyond conventional CNN-based designs.

  • Conference Article
  • Cite Count Icon 302
  • 10.1109/cvpr46437.2021.01453
Checkerboard Context Model for Efficient Learned Image Compression
  • Jun 1, 2021
  • Dailan He + 4 more

For learned image compression, the autoregressive context model is proved effective in improving the rate-distortion (RD) performance. Because it helps remove spatial redundancies among latent representations. However, the decoding process must be done in a strict scan order, which breaks the parallelization. We propose a parallelizable checkerboard context model (CCM) to solve the problem. Our two-pass checkerboard context calculation eliminates such limitations on spatial locations by re-organizing the decoding order. Speeding up the decoding process more than 40 times in our experiments, it achieves significantly improved computational efficiency with almost the same rate-distortion performance. To the best of our knowledge, this is the first exploration on parallelization-friendly spatial context model for learned image compression.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant