ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding
Recently, learned image compression techniques have achieved remarkable performance, even surpassing the best manually designed lossy image coders. They are promising to be large-scale adopted. For the sake of practicality, a thorough investigation of the architecture design of learned image compression, regarding both compression performance and running speed, is essential. In this paper, we first propose uneven channel-conditional adaptive coding, motivated by the observation of energy compaction in learned image compression. Combining the proposed uneven grouping model with existing context models, we obtain a spatial-channel contextual adaptive model to improve the coding performance without damage to running speed. Then we study the structure of the main transform and propose an efficient model, ELIC, to achieve state-of-the-art speed and compression ability. With superior performance, the proposed model also supports extremely fast preview decoding and progressive decoding, which makes the coming application of learning-based image compression more promising.
- Research Article
16
- 10.1109/tcsvt.2022.3229701
- Jun 1, 2023
- IEEE Transactions on Circuits and Systems for Video Technology
Progressive coding is essential to the practical deployment of learned image compression over heterogeneous networks and clients. Existing methods for learned progressive image compression require complex and empirical design to achieve near-optimal rate-distortion performance over a wide range of bit-rates. However, these methods are limited by the implicit learned mechanism based on neural networks and introduction of uniform quantizers. In this paper, we propose generalized learned progressive image compression with analytic rate-distortion optimization using dead-zone quantizers on the latent representation. Specifically, we reveal that dead-zone quantizers, as a general case of uniform quantizers, are equivalent to uniform quantizers in fixed-rate nonlinear transform coding and can prevent extra redundancy in embedded quantization for progressive coding. Consequently, we propose rate-distortion optimized learned progressive coding by approximating the optimal quantizer in the source spaces using dead-zone quantizers in an analytic manner on the Laplacian source. To our best knowledge, this paper is the first to achieve general learned progressive coding from the perspective of optimal quantizers. The proposed method achieves theoretically sound and practically efficient embedded quantization and learned progressive coding of latent representations with improved rate-distortion performance. It can also enable embedded quantization with diverse assignments of truncation points and support flexible configuration of quality layers of varying numbers and at varying target bit-rates. Furthermore, we successfully incorporate the proposed method into existing pre-trained fixed-rate models to realize progressive learned image compression without re-training. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performance in learned progressive image compression compared with traditional codecs and recent learned methods.
- Research Article
66
- 10.1109/tcsvt.2021.3119660
- Jun 1, 2022
- IEEE Transactions on Circuits and Systems for Video Technology
Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications. First, parallel acceleration of the autoregressive entropy model cannot be achieved due to serial decoding. Second, full-resolution inference often causes the out-of-memory (OOM) problem with limited GPU resources, especially for high-resolution images. Block partition is a good choice to handle the above issues, but it brings about new challenges in reducing the redundancy between blocks and eliminating block effects. To tackle the above challenges, this paper provides a learned block-based hybrid image compression (LBHIC) framework. Specifically, we introduce explicit intra prediction into a learned image compression framework to utilize the relation among adjacent blocks. Superior to context modeling by linear weighting of neighbor pixels in traditional codecs, we propose a contextual prediction module (CPM) to better capture long-range correlations by utilizing the strip pooling to extract the most relevant information in neighboring latent space, thus achieving effective information prediction. Moreover, to alleviate blocking artifacts, we further propose a boundary-aware postprocessing module (BPM) with the edge importance taken into account. Extensive experiments demonstrate that the proposed LBHIC codec outperforms the VVC, with a bit-rate conservation of 4.1%, and reduces the decoding time by approximately 86.7% compared with that of state-of-the-art learned image compression methods.
- Research Article
20
- 10.3390/rs15082211
- Apr 21, 2023
- Remote Sensing
Learned image compression has achieved a series of breakthroughs for nature images, but there is little literature focusing on high-resolution remote sensing image (HRRSI) datasets. This paper focuses on designing a learned lossy image compression framework for compressing HRRSIs. Considering the local and non-local redundancy contained in HRRSI, a mixed hyperprior network is designed to explore both the local and non-local redundancy in order to improve the accuracy of entropy estimation. In detail, a transformer-based hyperprior and a CNN-based hyperprior are fused for entropy estimation. Furthermore, to reduce the mismatch between training and testing, a three-stage training strategy is introduced to refine the network. In this training strategy, the entire network is first trained, and then some sub-networks are fixed while the others are trained. To evaluate the effectiveness of the proposed compression algorithm, the experiments are conducted on an HRRSI dataset. The results show that the proposed algorithm achieves comparable or better compression performance than some traditional and learned image compression algorithms, such as Joint Photographic Experts Group (JPEG) and JPEG2000. At a similar or lower bitrate, the proposed algorithm is about 2 dB higher than the PSNR value of JPEG2000.
- Research Article
- 10.1109/tcsvt.2024.3522621
- May 1, 2025
- IEEE Transactions on Circuits and Systems for Video Technology
In the field of autonomous driving, a variety of sensor data types exist, each representing different modalities of the same scene. Therefore, it is feasible to utilize data from other sensors to facilitate image compression. However, few techniques have explored the potential benefits of utilizing inter-modality correlations to enhance the image compression performance. In this paper, motivated by the recent success of learned image compression, we propose a new framework that uses sparse point clouds to assist in learned image compression in the autonomous driving scenario. We first project the 3D sparse point cloud onto a 2D plane, resulting in a sparse depth map. Utilizing this depth map, we proceed to predict camera images. Subsequently, we use these predicted images to extract multi-scale structural features. These features are then incorporated into learned image compression pipeline as additional information to improve the compression performance. Our proposed framework is compatible with various mainstream learned image compression models, and we validate our approach using different existing image compression methods. The experimental results show that incorporating point cloud assistance into the compression pipeline consistently enhances the performance.
- Conference Article
3
- 10.1109/icip40778.2020.9190974
- Oct 1, 2020
With recent advances in learned entropy and context models, the rate-distortion performance of deep learned image compression methods reached or surpassed those of conventional codecs. However, learned image compression is currently more complex and slower than conventional image compression. Learned image and video compression methods almost exclusively employ the generalized divisive normalization (GDN) activation function. This paper investigates the effect of activation function on the performance of image compression in terms of both objective and subjective criteria as well as runtime. In particular, we show that the distribution of latents produced by hard shrinkage fits a Laplacian better, and it is possible to achieve similar rate-distortion and better visual performance using hard shrinkage with lower complexity.
- Conference Article
112
- 10.1109/cvpr.2019.01031
- Jun 1, 2019
Compression has been an important research topic for many decades, to produce a significant impact on data transmission and storage. Recent advances have shown a great potential of learning image and video compression. Inspired from related works, in this paper, we present an image compression architecture using a convolutional autoencoder, and then generalize image compression to video compression, by adding an interpolation loop into both encoder and decoder sides. Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. Thereby, we propose to add a spatial energy compaction-based penalty into loss function, to achieve higher image compression performance. Furthermore, based on temporal energy distribution, we propose to select the number of frames in one interpolation loop, adapting to the motion characteristics of video contents. Experimental results demonstrate that our proposed image compression outperforms the latest image compression standard with MS-SSIM quality metric, and provides higher performance compared with state-of-the-art learning compression methods at high bit rates, which benefits from our spatial energy compaction approach. Meanwhile, our proposed video compression approach with temporal energy compaction can significantly outperform MPEG-4 and is competitive with commonly used H.264. Both our image and video compression can produce more visually pleasant results than traditional standards.
- Dissertation
- 10.33915/etd.13084
- Jan 1, 2025
The rapid advancement of information technology and the exponential growth of digital communication have significantly increased the demand for efficient data compression techniques that reduce storage requirements, minimize bandwidth consumption, and accelerate data transmission—without substantially compromising data quality. This dissertation addresses these challenges by investigating and developing advanced learned image compression (LIC) methods, with a particular focus on lossy compression for both natural images and scientific imagery obtained from NASA’s Solar Dynamics Observatory (SDO) mission. Traditional image compression standards—such as JPEG, JPEG2000, BPG, and HEVC—rely on manually engineered transforms and heuristic rules, which often lack the adaptability required to accommodate diverse visual content and application-specific constraints. In contrast, learned image compression employs deep neural networks trained in an end-to-end manner, guided by principles from rate–distortion theory, to optimize the trade-off between compression efficiency and reconstruction fidelity. In the first part of this dissertation, several technical challenges in developing neural image compression codecs for natural images (general-purpose) are addressed, including the design of expressive nonlinear transforms, accurate entropy modeling, and the integration of perceptually meaningful loss functions. To this end, several learned image compression frameworks are proposed, each introducing distinct design innovations: a Transformer-based nonlinear transform that captures both local and global dependencies, an advanced entropy model that improves probability estimation and coding efficiency, and a conditional diffusion-based generative framework that enhances the perceptual quality of reconstructed images. The second part focuses on the application of learned compression to imagery from NASA’s Solar Dynamics Observatory (SDO) mission. A learned video compression framework is developed to exploit both spatial and temporal redundancies in solar image sequences. Furthermore, an adaptive compression strategy is introduced to prioritize scientific relevance: images containing solar flare events are compressed at lower ratios to preserve critical information, whereas non-flare images are compressed more aggressively to maximize storage and transmission efficiency. Collectively, these contributions advance the field of learned image compression across both general-purpose and scientific imaging domains, providing practical solutions for improving data transmission and storage efficiency in real-world and mission-critical environments.
- Research Article
9
- 10.1016/j.neucom.2022.07.065
- Jul 22, 2022
- Neurocomputing
Successive learned image compression: Comprehensive analysis of instability
- Research Article
7
- 10.1016/j.engappai.2023.107596
- Dec 4, 2023
- Engineering Applications of Artificial Intelligence
Learned image compression via neighborhood-based attention optimization and context modeling with multi-scale guiding
- Research Article
- 10.1609/aaai.v39i10.33100
- Apr 11, 2025
- Proceedings of the AAAI Conference on Artificial Intelligence
Learned image compression (LIC) has achieved state-of-the-art rate-distortion performance, deemed promising for next-generation image compression techniques. However, pre-trained LIC models usually suffer from significant performance degradation when applied to out-of-training-domain images, implying their poor generalization capabilities. To tackle this problem, we propose a few-shot domain adaptation method for LIC by integrating plug-and-play adapters into pre-trained models. Drawing inspiration from the analogy between latent channels and frequency components, we examine domain gaps in LIC and observe that out-of-training-domain images disrupt pre-trained channel-wise decomposition. Consequently, we introduce a method for channel-wise re-allocation using convolution-based adapters and low-rank adapters, which are lightweight and compatible to mainstream LIC schemes. Extensive experiments across multiple domains and multiple representative LIC schemes demonstrate that our method significantly enhances pre-trained models, achieving comparable performance to H.266/VVC intra coding with merely 25 target-domain samples. Additionally, our method matches the performance of full-model finetune while transmitting fewer than 2% of the parameters.
- Book Chapter
2
- 10.1007/978-3-031-19839-7_16
- Jan 1, 2022
In Cloud 3D, such as Cloud Gaming and Cloud Virtual Reality (VR), image frames are rendered and compressed (encoded) in the cloud, and sent to the clients for users to view. For low latency and high image quality, fast, high compression rate, and high-quality image compression techniques are preferable. This paper explores computation time reduction techniques for learned image compression to make it more suitable for cloud 3D. More specifically, we employed slim (low-complexity) and application-specific AI models to reduce the computation time without degrading image quality. Our approach is based on two key insights: (1) as the frames generated by a 3D application are highly homogeneous, application-specific compression models can improve the rate-distortion performance over a general model; (2) many computer-generated frames from 3D applications are less complex than natural photos, which makes it feasible to reduce the model complexity to accelerate compression computation. We evaluated our models on six gaming image datasets. The results show that our approach has similar rate-distortion performance as a state-of-the-art learned image compression algorithm, while obtaining about 5x to 9x speedup and reducing the compression time to be less than 1 s (0.74s), bringing learned image compression closer to being viable for cloud 3D. Code is available at https://github.com/cloud-graphics-rendering/AppSpecificLIC.KeywordsCloud gamingCloud virtual realityLearned image compressionModel simplificationApplication-specific modelingModel-task balance
- Research Article
20
- 10.1109/tip.2023.3319275
- Jan 1, 2023
- IEEE Transactions on Image Processing
Learned image compression methods have achieved satisfactory results in recent years. However, existing methods are typically designed for RGB format, which are not suitable for YUV420 format due to the variance of different formats. In this paper, we propose an information-guided compression framework using cross-component attention mechanism, which can achieve efficient image compression in YUV420 format. Specifically, we design a dual-branch advanced information-preserving module (AIPM) based on the information-guided unit (IGU) and attention mechanism. On the one hand, the dual-branch architecture can prevent changes in original data distribution and avoid information disturbance between different components. The feature attention block (FAB) can preserve the important information. On the other hand, IGU can efficiently utilize the correlations between Y and UV components, which can further preserve the information of UV by the guidance of Y. Furthermore, we design an adaptive cross-channel enhancement module (ACEM) to reconstruct the details by utilizing the relations from different components, which makes use of the reconstructed Y as the textural and structural guidance for UV components. Extensive experiments show that the proposed framework can achieve the state-of-the-art performance in image compression for YUV420 format. More importantly, the proposed framework outperforms Versatile Video Coding (VVC) with 8.37% BD-rate reduction on common test conditions (CTC) sequences on average. In addition, we propose a quantization scheme for context model without model retraining, which can overcome the cross-platform decoding error caused by the floating-point operations in context model and provide a reference approach for the application of neural codec on different platforms.
- Research Article
8
- 10.1109/tcsvt.2024.3401872
- Oct 1, 2024
- IEEE Transactions on Circuits and Systems for Video Technology
In recent years, Learned Image Compression (LIC) has undergone rapid evolution. However, it is worthy noting that most prevalent LIC methodologies still rely on uniform Scalar Quantization (SQ) for latent features. This overlooks the untapped potential of contextual information, which could be leveraged to significantly reduce statistical redundancies. Prior researches have explored Vector Quantization (VQ)’s adaptability to diverse data distributions, yet it introduces significant computational complexity into LIC, hindering its practical implementation. Consequently, in this work, we propose the Contextual Sequential Quantization (CSQ) method, which progressively discretizes the latent features of LIC by harnessing content contextual information and image textural priors. Our proposed CSQ signifies progress in LIC by blending the computational efficiency of SQ with a substantial approach towards the adaptability of VQ. We further propose the Center Compensation Module (CCM) based on the proposed CSQ. This module strategically determines adaptive quantization centers, leading to a direct enhancement of reconstruction quality without compromising the bit-rate. Moreover, it is worth noticing that existing LIC approaches face challenges in leveraging hyper side information to effectively enhance transformations, which is attributed to the entanglement of the hyperprior generation module with the main transformations. Consequently, we propose to decouple the hyperprior module from main transformations, and design the Hyperprior-Assisted Transformation (HAT) unit to feed hyperprior back into main transformations. This further improves the coding performance. By integrating all together the proposed CSQ, CCM, and HAT, our proposed Non-uniform quantization-based LIC (NLIC) method attains state-of-the-art rate-distortion (R-D) performance among existing LIC methodologies.
- Research Article
- 10.3390/jimaging12010012
- Dec 26, 2025
- Journal of Imaging
We present a hybrid end-to-end learned image compression framework that combines a CNN-based variational autoencoder (VAE) with an efficient hierarchical Swin Transformer to address the limitations of existing entropy models in capturing global dependencies under computational constraints. Traditional VAE-based codecs typically rely on CNN-based priors with localized receptive fields, which are insufficient for modelling the complex, high-dimensional dependencies of the latent space, thereby limiting compression efficiency. While fully global transformer-based models can capture long-range dependencies, their high computational complexity makes them impractical for high-resolution image compression. To overcome this trade-off, our approach couples a CNN-based VAE with a patch-based hierarchical Swin Transformer hyperprior that employs shifted window self-attention to effectively model both local and global contextual information while maintaining computational efficiency. The proposed framework tightly integrates this expressive entropy model with an end-to-end differentiable quantization module, enabling joint optimization of the complete rate-distortion objective. By learning a more accurate probability distribution of the latent representation, the model achieves improved bitrate estimation and a more compact latent representation, resulting in enhanced compression performance. We validate our approach on the widely used Kodak, JPEG AI, and CLIC datasets, demonstrating that the proposed hybrid architecture achieves superior rate-distortion performance, delivering higher visual quality at lower bitrates compared to methods relying on simpler CNN-based entropy priors. This work demonstrates the effectiveness of integrating efficient transformer architectures into learned image compression and highlights their potential for advancing entropy modelling beyond conventional CNN-based designs.
- Conference Article
4
- 10.1109/vcip49819.2020.9301753
- Dec 1, 2020
The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.