Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Learned Block-Based Hybrid Image Compression

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications. First, parallel acceleration of the autoregressive entropy model cannot be achieved due to serial decoding. Second, full-resolution inference often causes the out-of-memory (OOM) problem with limited GPU resources, especially for high-resolution images. Block partition is a good choice to handle the above issues, but it brings about new challenges in reducing the redundancy between blocks and eliminating block effects. To tackle the above challenges, this paper provides a learned block-based hybrid image compression (LBHIC) framework. Specifically, we introduce explicit intra prediction into a learned image compression framework to utilize the relation among adjacent blocks. Superior to context modeling by linear weighting of neighbor pixels in traditional codecs, we propose a contextual prediction module (CPM) to better capture long-range correlations by utilizing the strip pooling to extract the most relevant information in neighboring latent space, thus achieving effective information prediction. Moreover, to alleviate blocking artifacts, we further propose a boundary-aware postprocessing module (BPM) with the edge importance taken into account. Extensive experiments demonstrate that the proposed LBHIC codec outperforms the VVC, with a bit-rate conservation of 4.1%, and reduces the decoding time by approximately 86.7% compared with that of state-of-the-art learned image compression methods.

Similar Papers
  • Research Article
  • Cite Count Icon 16
  • 10.1109/tcsvt.2022.3229701
Learned Progressive Image Compression With Dead-Zone Quantizers
  • Jun 1, 2023
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Shaohui Li + 5 more

Progressive coding is essential to the practical deployment of learned image compression over heterogeneous networks and clients. Existing methods for learned progressive image compression require complex and empirical design to achieve near-optimal rate-distortion performance over a wide range of bit-rates. However, these methods are limited by the implicit learned mechanism based on neural networks and introduction of uniform quantizers. In this paper, we propose generalized learned progressive image compression with analytic rate-distortion optimization using dead-zone quantizers on the latent representation. Specifically, we reveal that dead-zone quantizers, as a general case of uniform quantizers, are equivalent to uniform quantizers in fixed-rate nonlinear transform coding and can prevent extra redundancy in embedded quantization for progressive coding. Consequently, we propose rate-distortion optimized learned progressive coding by approximating the optimal quantizer in the source spaces using dead-zone quantizers in an analytic manner on the Laplacian source. To our best knowledge, this paper is the first to achieve general learned progressive coding from the perspective of optimal quantizers. The proposed method achieves theoretically sound and practically efficient embedded quantization and learned progressive coding of latent representations with improved rate-distortion performance. It can also enable embedded quantization with diverse assignments of truncation points and support flexible configuration of quality layers of varying numbers and at varying target bit-rates. Furthermore, we successfully incorporate the proposed method into existing pre-trained fixed-rate models to realize progressive learned image compression without re-training. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performance in learned progressive image compression compared with traditional codecs and recent learned methods.

  • Research Article
  • Cite Count Icon 3
  • 10.1109/tip.2025.3567830
Approximately Invertible Neural Network for Learned Image Compression.
  • Jan 1, 2025
  • IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
  • Yanbo Gao + 7 more

Learned image compression has attracted considerable interests in recent years. An analysis transform and a synthesis transform, which can be regarded as coupled transforms, are used to encode an image to latent feature and decode the feature after quantization to reconstruct the image. Inspired by the success of invertible neural networks in generative modeling, invertible modules can be used to construct the coupled analysis and synthesis transforms. Considering the noise introduced in the feature quantization invalidates the invertible process, this paper proposes an Approximately Invertible Neural Network (A-INN) framework for learned image compression. It formulates the rate-distortion optimization in lossy image compression when using INN with quantization, which differentiates from using INN for generative modelling. Generally speaking, A-INN can be used as the theoretical foundation for any INN based lossy compression method. Based on this formulation, A-INN with a progressive denoising module (PDM) is developed to effectively reduce the quantization noise in the decoding. Moreover, a Cascaded Feature Recovery Module (CFRM) is designed to learn high-dimensional feature recovery from low-dimensional ones to further reduce the noise in feature channel compression. In addition, a Frequency-enhanced Decomposition and Synthesis Module (FDSM) is developed by explicitly enhancing the high-frequency components in an image to address the loss of high-frequency information inherent in neural network based image compression, thereby enhancing the reconstructed image quality. Extensive experiments demonstrate that the proposed A-INN framework achieves better or comparable compression efficiency than the conventional image compression approach and state-of-the-art learned image compression methods.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 20
  • 10.3390/rs15082211
Remote Sensing Image Compression Based on the Multiple Prior Information
  • Apr 21, 2023
  • Remote Sensing
  • Chuan Fu + 1 more

Learned image compression has achieved a series of breakthroughs for nature images, but there is little literature focusing on high-resolution remote sensing image (HRRSI) datasets. This paper focuses on designing a learned lossy image compression framework for compressing HRRSIs. Considering the local and non-local redundancy contained in HRRSI, a mixed hyperprior network is designed to explore both the local and non-local redundancy in order to improve the accuracy of entropy estimation. In detail, a transformer-based hyperprior and a CNN-based hyperprior are fused for entropy estimation. Furthermore, to reduce the mismatch between training and testing, a three-stage training strategy is introduced to refine the network. In this training strategy, the entire network is first trained, and then some sub-networks are fixed while the others are trained. To evaluate the effectiveness of the proposed compression algorithm, the experiments are conducted on an HRRSI dataset. The results show that the proposed algorithm achieves comparable or better compression performance than some traditional and learned image compression algorithms, such as Joint Photographic Experts Group (JPEG) and JPEG2000. At a similar or lower bitrate, the proposed algorithm is about 2 dB higher than the PSNR value of JPEG2000.

  • Dissertation
  • 10.33915/etd.13084
Neural Network-based Image Compression
  • Jan 1, 2025
  • Atefeh Khoshkhahtinat

The rapid advancement of information technology and the exponential growth of digital communication have significantly increased the demand for efficient data compression techniques that reduce storage requirements, minimize bandwidth consumption, and accelerate data transmission—without substantially compromising data quality. This dissertation addresses these challenges by investigating and developing advanced learned image compression (LIC) methods, with a particular focus on lossy compression for both natural images and scientific imagery obtained from NASA’s Solar Dynamics Observatory (SDO) mission. Traditional image compression standards—such as JPEG, JPEG2000, BPG, and HEVC—rely on manually engineered transforms and heuristic rules, which often lack the adaptability required to accommodate diverse visual content and application-specific constraints. In contrast, learned image compression employs deep neural networks trained in an end-to-end manner, guided by principles from rate–distortion theory, to optimize the trade-off between compression efficiency and reconstruction fidelity. In the first part of this dissertation, several technical challenges in developing neural image compression codecs for natural images (general-purpose) are addressed, including the design of expressive nonlinear transforms, accurate entropy modeling, and the integration of perceptually meaningful loss functions. To this end, several learned image compression frameworks are proposed, each introducing distinct design innovations: a Transformer-based nonlinear transform that captures both local and global dependencies, an advanced entropy model that improves probability estimation and coding efficiency, and a conditional diffusion-based generative framework that enhances the perceptual quality of reconstructed images. The second part focuses on the application of learned compression to imagery from NASA’s Solar Dynamics Observatory (SDO) mission. A learned video compression framework is developed to exploit both spatial and temporal redundancies in solar image sequences. Furthermore, an adaptive compression strategy is introduced to prioritize scientific relevance: images containing solar flare events are compressed at lower ratios to preserve critical information, whereas non-flare images are compressed more aggressively to maximize storage and transmission efficiency. Collectively, these contributions advance the field of learned image compression across both general-purpose and scientific imaging domains, providing practical solutions for improving data transmission and storage efficiency in real-world and mission-critical environments.

  • Research Article
  • 10.1109/tip.2025.3598916
Rate-Distortion-Complexity Optimized Framework for Multi-Model Image Compression.
  • Jan 1, 2025
  • IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
  • Xinyu Hang + 5 more

Learned Image Compression (LIC) has experienced rapid growth with the emergence of diverse frameworks. However, the variability in model design and training datasets poses a challenge for the universal application of a single coding model. To address this problem, this paper introduces a pioneering multi-model image coding framework that integrates various image codecs to overcome these limitations. By dynamically allocating codecs to different image regions, our framework optimizes reconstruction quality within the constraints of limited bitrate and decoding time, offering a high-performance, ubiquitous solution for the rate-distortion-complexity trade-off. Our framework features a detailed codec assignment algorithm based on the Simulated Annealing (SA) method, selected for its proven efficacy in managing the discrete and intricate nature of codec assignment optimization. We have implemented a coarse-to-fine strategy, which significantly enhances efficiency. Notably, our framework maintains compatibility with all standard image codecs without necessitating structural modifications. Empirical results indicate that our framework establishes a new standard in LIC, advancing the Pareto frontier for performance-complexity trade-offs. It achieves a significant 70% reduction in decoding time compared to current state-of-the-art methods, without compromising reconstruction quality. Furthermore, under comparable conditions, our approach not only outperforms but significantly eclipses existing Rate-Distortion-Complexity (RDC) optimized codecs, with decoding speeds up to 30 times faster.

  • Conference Article
  • Cite Count Icon 112
  • 10.1109/cvpr.2019.01031
Learning Image and Video Compression Through Spatial-Temporal Energy Compaction
  • Jun 1, 2019
  • Zhengxue Cheng + 3 more

Compression has been an important research topic for many decades, to produce a significant impact on data transmission and storage. Recent advances have shown a great potential of learning image and video compression. Inspired from related works, in this paper, we present an image compression architecture using a convolutional autoencoder, and then generalize image compression to video compression, by adding an interpolation loop into both encoder and decoder sides. Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. Thereby, we propose to add a spatial energy compaction-based penalty into loss function, to achieve higher image compression performance. Furthermore, based on temporal energy distribution, we propose to select the number of frames in one interpolation loop, adapting to the motion characteristics of video contents. Experimental results demonstrate that our proposed image compression outperforms the latest image compression standard with MS-SSIM quality metric, and provides higher performance compared with state-of-the-art learning compression methods at high bit rates, which benefits from our spatial energy compaction approach. Meanwhile, our proposed video compression approach with temporal energy compaction can significantly outperform MPEG-4 and is competitive with commonly used H.264. Both our image and video compression can produce more visually pleasant results than traditional standards.

  • Research Article
  • 10.1109/tcsvt.2024.3522621
Sparse Point Clouds Assisted Learned Image Compression
  • May 1, 2025
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Yiheng Jiang + 4 more

In the field of autonomous driving, a variety of sensor data types exist, each representing different modalities of the same scene. Therefore, it is feasible to utilize data from other sensors to facilitate image compression. However, few techniques have explored the potential benefits of utilizing inter-modality correlations to enhance the image compression performance. In this paper, motivated by the recent success of learned image compression, we propose a new framework that uses sparse point clouds to assist in learned image compression in the autonomous driving scenario. We first project the 3D sparse point cloud onto a 2D plane, resulting in a sparse depth map. Utilizing this depth map, we proceed to predict camera images. Subsequently, we use these predicted images to extract multi-scale structural features. These features are then incorporated into learned image compression pipeline as additional information to improve the compression performance. Our proposed framework is compatible with various mainstream learned image compression models, and we validate our approach using different existing image compression methods. The experimental results show that incorporating point cloud assistance into the compression pipeline consistently enhances the performance.

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.neucom.2022.07.065
Successive learned image compression: Comprehensive analysis of instability
  • Jul 22, 2022
  • Neurocomputing
  • Jun-Hyuk Kim + 3 more

Successive learned image compression: Comprehensive analysis of instability

  • Research Article
  • Cite Count Icon 22
  • 10.1016/j.sigpro.2022.108778
Learned image compression with generalized octave convolution and cross-resolution parameter estimation
  • Sep 12, 2022
  • Signal Processing
  • Haisheng Fu + 1 more

Learned image compression with generalized octave convolution and cross-resolution parameter estimation

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icip40778.2020.9190974
Shrinkage as Activation for Learned Image Compression
  • Oct 1, 2020
  • Ogun Kirmemis + 1 more

With recent advances in learned entropy and context models, the rate-distortion performance of deep learned image compression methods reached or surpassed those of conventional codecs. However, learned image compression is currently more complex and slower than conventional image compression. Learned image and video compression methods almost exclusively employ the generalized divisive normalization (GDN) activation function. This paper investigates the effect of activation function on the performance of image compression in terms of both objective and subjective criteria as well as runtime. In particular, we show that the distribution of latents produced by hard shrinkage fits a Laplacian better, and it is possible to achieve similar rate-distortion and better visual performance using hard shrinkage with lower complexity.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/icip42928.2021.9506076
Learned Image Compression with Channel-Wise Grouped Context Modeling
  • Sep 19, 2021
  • Liang Yuan + 6 more

Learned image compression has achieved improved rate-distortion performance with end-to-end optimized framework based on deep neural networks. However, context-based entropy modeling for learned image compression cannot simultaneously achieve enhanced efficiency and sufficiently exploiting the channel-wise correlations. In this paper, we propose a novel framework for learned image compression with channel-wise grouped context modeling. The proposed framework presents channel-wise grouping to explicitly exploit the channel-wise correlations and develop a grouped 3-D context model to achieve efficient entropy coding with a guarantee of rate-distortion performance. The proposed framework achieves competitive performance with a significantly reduced decoding complexity in comparison to 3-D context models.

  • Conference Article
  • Cite Count Icon 7
  • 10.1109/vcip49819.2020.9301767
Volumetric End-to-End Optimized Compression for Brain Images
  • Dec 1, 2020
  • Shuo Gao + 3 more

The amount of volumetric brain image increases rapidly, which requires a vast amount of resources for storage and transmission, so it’s urgent to explore an efficient volumetric compression method. Recent years have witnessed the progress of deep learning-based approaches for two-dimensional (2D) natural image compression, but the field of learned volumetric image compression still remains unexplored. In this paper, we propose the first end-to-end learning framework for volumetric image compression by extending the advanced techniques of 2D image compression to volumetric images. Specifically, a convolutional autoencoder is used to compress 3D image cubes, and the non-local attention models are embedded in the convolutional autoencoder to jointly capture local and global correlations. Both hyperprior and autoregressive models are used to perform the conditional probability estimation in entropy coding. To reduce model complexity, we introduce a convolutional long short-term memory network for the autoregressive model based on channel-wise prediction. Experimental results on volumetric mouse brain images show that the proposed method outperforms JPEG2000-3D, HEVC and state-of-the-art 2D methods.

  • Research Article
  • 10.1186/s13634-025-01268-x
Variable rate compression with Uniform Spatial-Frequency Residual Bottleneck Adapter for learned image compression
  • Dec 29, 2025
  • EURASIP Journal on Advances in Signal Processing
  • Ran Wang + 3 more

Recent advances in learned image compression (LIC) have demonstrated superior performance over traditional methods but often require training and storage of multiple models to handle different bitrate settings. In this paper, we propose the Uniform Spatial-Frequency Residual Bottleneck Modulation Adapter (U-SFRB), a plug-and-play, adapter-based framework for variable rate image compression that significantly reduces training and storage overhead. Our method freezes the backbone network and only trains lightweight adapters—Spatial-Frequency Residual Bottleneck Adapters (SFRBs)—to achieve rate adaptability. By inserting multiple SFRBs in parallel, our approach enables a single model to support a wide range of bitrates. Unlike prompt-based methods restricted to transformer architectures, our approach is compatible with both CNN- and transformer-based compression models. Experimental results on the Kodak and CLIC datasets show that our method achieves competitive rate-distortion performance compared to state-of-the-art variable rate compression approaches, with the advantage of lower training complexity and better model flexibility.

  • Research Article
  • 10.3390/jimaging12010012
Patched-Based Swin Transformer Hyperprior for Learned Image Compression
  • Dec 26, 2025
  • Journal of Imaging
  • Sibusiso B Buthelezi + 1 more

We present a hybrid end-to-end learned image compression framework that combines a CNN-based variational autoencoder (VAE) with an efficient hierarchical Swin Transformer to address the limitations of existing entropy models in capturing global dependencies under computational constraints. Traditional VAE-based codecs typically rely on CNN-based priors with localized receptive fields, which are insufficient for modelling the complex, high-dimensional dependencies of the latent space, thereby limiting compression efficiency. While fully global transformer-based models can capture long-range dependencies, their high computational complexity makes them impractical for high-resolution image compression. To overcome this trade-off, our approach couples a CNN-based VAE with a patch-based hierarchical Swin Transformer hyperprior that employs shifted window self-attention to effectively model both local and global contextual information while maintaining computational efficiency. The proposed framework tightly integrates this expressive entropy model with an end-to-end differentiable quantization module, enabling joint optimization of the complete rate-distortion objective. By learning a more accurate probability distribution of the latent representation, the model achieves improved bitrate estimation and a more compact latent representation, resulting in enhanced compression performance. We validate our approach on the widely used Kodak, JPEG AI, and CLIC datasets, demonstrating that the proposed hybrid architecture achieves superior rate-distortion performance, delivering higher visual quality at lower bitrates compared to methods relying on simpler CNN-based entropy priors. This work demonstrates the effectiveness of integrating efficient transformer architectures into learned image compression and highlights their potential for advancing entropy modelling beyond conventional CNN-based designs.

  • Research Article
  • 10.1609/aaai.v39i10.33100
Few-Shot Domain Adaptation for Learned Image Compression
  • Apr 11, 2025
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Tianyu Zhang + 4 more

Learned image compression (LIC) has achieved state-of-the-art rate-distortion performance, deemed promising for next-generation image compression techniques. However, pre-trained LIC models usually suffer from significant performance degradation when applied to out-of-training-domain images, implying their poor generalization capabilities. To tackle this problem, we propose a few-shot domain adaptation method for LIC by integrating plug-and-play adapters into pre-trained models. Drawing inspiration from the analogy between latent channels and frequency components, we examine domain gaps in LIC and observe that out-of-training-domain images disrupt pre-trained channel-wise decomposition. Consequently, we introduce a method for channel-wise re-allocation using convolution-based adapters and low-rank adapters, which are lightweight and compatible to mainstream LIC schemes. Extensive experiments across multiple domains and multiple representative LIC schemes demonstrate that our method significantly enhances pre-trained models, achieving comparable performance to H.266/VVC intra coding with merely 25 target-domain samples. Additionally, our method matches the performance of full-model finetune while transmitting fewer than 2% of the parameters.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant