Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Unveiling the Future of Human and Machine Coding: A Survey of End-to-End Learned Image Compression.

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

End-to-end learned image compression codecs have notably emerged in recent years. These codecs have demonstrated superiority over conventional methods, showcasing remarkable flexibility and adaptability across diverse data domains while supporting new distortion losses. Despite challenges such as computational complexity, learned image compression methods inherently align with learning-based data processing and analytic pipelines due to their well-suited internal representations. The concept of Video Coding for Machines has garnered significant attention from both academic researchers and industry practitioners. This concept reflects the growing need to integrate data compression with computer vision applications. In light of these developments, we present a comprehensive survey and review of lossy image compression methods. Additionally, we provide a concise overview of two prominent international standards, MPEG Video Coding for Machines and JPEG AI. These standards are designed to bridge the gap between data compression and computer vision, catering to practical industry use cases.

Similar Papers
  • Research Article
  • Cite Count Icon 16
  • 10.1109/tcsvt.2022.3229701
Learned Progressive Image Compression With Dead-Zone Quantizers
  • Jun 1, 2023
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Shaohui Li + 5 more

Progressive coding is essential to the practical deployment of learned image compression over heterogeneous networks and clients. Existing methods for learned progressive image compression require complex and empirical design to achieve near-optimal rate-distortion performance over a wide range of bit-rates. However, these methods are limited by the implicit learned mechanism based on neural networks and introduction of uniform quantizers. In this paper, we propose generalized learned progressive image compression with analytic rate-distortion optimization using dead-zone quantizers on the latent representation. Specifically, we reveal that dead-zone quantizers, as a general case of uniform quantizers, are equivalent to uniform quantizers in fixed-rate nonlinear transform coding and can prevent extra redundancy in embedded quantization for progressive coding. Consequently, we propose rate-distortion optimized learned progressive coding by approximating the optimal quantizer in the source spaces using dead-zone quantizers in an analytic manner on the Laplacian source. To our best knowledge, this paper is the first to achieve general learned progressive coding from the perspective of optimal quantizers. The proposed method achieves theoretically sound and practically efficient embedded quantization and learned progressive coding of latent representations with improved rate-distortion performance. It can also enable embedded quantization with diverse assignments of truncation points and support flexible configuration of quality layers of varying numbers and at varying target bit-rates. Furthermore, we successfully incorporate the proposed method into existing pre-trained fixed-rate models to realize progressive learned image compression without re-training. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performance in learned progressive image compression compared with traditional codecs and recent learned methods.

  • Research Article
  • 10.1109/tmm.2026.3651136
Learned Image Compression Via Local-to-Global Cross-Component Prior
  • Jan 1, 2026
  • IEEE Transactions on Multimedia
  • Wenhong Duan + 6 more

Learned image compression (LIC) methods have shown promising results and achieved superior performance compared to traditional image compression methods. Due to the neglect of the utilization of cross-component correlations, there is still a potential for further performance improvement. In this paper, we first explore the inter-channel correlations of different color spaces and transform the image compression problem in RGB color space into that in YUV color space, which has cross-component prior information. We propose a novel image compression method that leverages local-to-global cross-component prior modeling, utilizing a cross-component attention mechanism to improve coding performance. First, we design the cross-component prior gate (CPG) to model the cross-component prior information based on attention mechanism. Inspired by common knowledge in data compression, luma component (Y) contains more details and textural/structural information compared to chroma components (UV). The proposed method can make full use of the cross-component guidance information from luma to chroma components to achieve effective image compression. Experimental results demonstrate that the proposed method can achieve superior performance compared to existing learned image compression methods. The proposed method can achieve 9.20% rate savings compared to the image compression standard Versatile Video Coding (VVC) Test Model (VTM-11.0) on Kodak dataset.

  • Research Article
  • Cite Count Icon 66
  • 10.1109/tcsvt.2021.3119660
Learned Block-Based Hybrid Image Compression
  • Jun 1, 2022
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Yaojun Wu + 4 more

Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications. First, parallel acceleration of the autoregressive entropy model cannot be achieved due to serial decoding. Second, full-resolution inference often causes the out-of-memory (OOM) problem with limited GPU resources, especially for high-resolution images. Block partition is a good choice to handle the above issues, but it brings about new challenges in reducing the redundancy between blocks and eliminating block effects. To tackle the above challenges, this paper provides a learned block-based hybrid image compression (LBHIC) framework. Specifically, we introduce explicit intra prediction into a learned image compression framework to utilize the relation among adjacent blocks. Superior to context modeling by linear weighting of neighbor pixels in traditional codecs, we propose a contextual prediction module (CPM) to better capture long-range correlations by utilizing the strip pooling to extract the most relevant information in neighboring latent space, thus achieving effective information prediction. Moreover, to alleviate blocking artifacts, we further propose a boundary-aware postprocessing module (BPM) with the edge importance taken into account. Extensive experiments demonstrate that the proposed LBHIC codec outperforms the VVC, with a bit-rate conservation of 4.1%, and reduces the decoding time by approximately 86.7% compared with that of state-of-the-art learned image compression methods.

  • Research Article
  • Cite Count Icon 16
  • 10.1109/tbc.2024.3464413
JND-LIC: Learned Image Compression via Just Noticeable Difference for Human Visual Perception
  • Mar 1, 2025
  • IEEE Transactions on Broadcasting
  • Zhaoqing Pan + 6 more

Existing human visual perception-oriented image compression methods well maintain the perceptual quality of compressed images, but they may introduce fake details into the compressed images, and cannot dynamically improve the perceptual rate-distortion performance at the pixel level. To address these issues, a just noticeable difference (JND)-based learned image compression (JND-LIC) method is proposed for human visual perception in this paper, in which a weight-shared model is used to extract image features and JND features, and the learned JND features are utilized as perceptual prior knowledge to assist the image coding process. In order to generate a highly compact image feature representation, a JND-based feature transform module is proposed to model the pixel-to-pixel masking correlation between the image features and the JND features. Furthermore, inspired by eye movement research that the human visual system perceives image degradation unevenly, a JND-guided quantization mechanism is proposed for the entropy coding, which adjusts the quantization step of each pixel to further eliminate perceptual redundancies. Extensive experimental results show that our proposed JND-LIC significantly improves the perceptual quality of compressed images with fewer coding bits compared to state-of-the-art learned image compression methods. Additionally, the proposed method can be flexibly integrated with various advanced learned image compression methods, and has robust generalization capabilities to improve the efficiency of perceptual coding.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icip40778.2020.9190974
Shrinkage as Activation for Learned Image Compression
  • Oct 1, 2020
  • Ogun Kirmemis + 1 more

With recent advances in learned entropy and context models, the rate-distortion performance of deep learned image compression methods reached or surpassed those of conventional codecs. However, learned image compression is currently more complex and slower than conventional image compression. Learned image and video compression methods almost exclusively employ the generalized divisive normalization (GDN) activation function. This paper investigates the effect of activation function on the performance of image compression in terms of both objective and subjective criteria as well as runtime. In particular, we show that the distribution of latents produced by hard shrinkage fits a Laplacian better, and it is possible to achieve similar rate-distortion and better visual performance using hard shrinkage with lower complexity.

  • Dissertation
  • 10.33915/etd.13084
Neural Network-based Image Compression
  • Jan 1, 2025
  • Atefeh Khoshkhahtinat

The rapid advancement of information technology and the exponential growth of digital communication have significantly increased the demand for efficient data compression techniques that reduce storage requirements, minimize bandwidth consumption, and accelerate data transmission—without substantially compromising data quality. This dissertation addresses these challenges by investigating and developing advanced learned image compression (LIC) methods, with a particular focus on lossy compression for both natural images and scientific imagery obtained from NASA’s Solar Dynamics Observatory (SDO) mission. Traditional image compression standards—such as JPEG, JPEG2000, BPG, and HEVC—rely on manually engineered transforms and heuristic rules, which often lack the adaptability required to accommodate diverse visual content and application-specific constraints. In contrast, learned image compression employs deep neural networks trained in an end-to-end manner, guided by principles from rate–distortion theory, to optimize the trade-off between compression efficiency and reconstruction fidelity. In the first part of this dissertation, several technical challenges in developing neural image compression codecs for natural images (general-purpose) are addressed, including the design of expressive nonlinear transforms, accurate entropy modeling, and the integration of perceptually meaningful loss functions. To this end, several learned image compression frameworks are proposed, each introducing distinct design innovations: a Transformer-based nonlinear transform that captures both local and global dependencies, an advanced entropy model that improves probability estimation and coding efficiency, and a conditional diffusion-based generative framework that enhances the perceptual quality of reconstructed images. The second part focuses on the application of learned compression to imagery from NASA’s Solar Dynamics Observatory (SDO) mission. A learned video compression framework is developed to exploit both spatial and temporal redundancies in solar image sequences. Furthermore, an adaptive compression strategy is introduced to prioritize scientific relevance: images containing solar flare events are compressed at lower ratios to preserve critical information, whereas non-flare images are compressed more aggressively to maximize storage and transmission efficiency. Collectively, these contributions advance the field of learned image compression across both general-purpose and scientific imaging domains, providing practical solutions for improving data transmission and storage efficiency in real-world and mission-critical environments.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/cccai59026.2023.00041
HFLIC: Human Friendly Perceptual Learned Image Compression with Reinforced Transform
  • Jun 1, 2023
  • Peirong Ning + 2 more

In recent years, there has been rapid development in learned image compression techniques that prioritize rate-distortion-perceptual compression, preserving fine details even at lower bit-rates. However, current learning-based image compression methods often sacrifice human-friendly compression and require long decoding times. In this paper, we propose enhancements to the backbone network and loss function of existing image compression model, focusing on improving human perception and efficiency. Our proposed approach achieves competitive subjective results compared to state-of-the-art end-to-end learned image compression methods and classic methods, while requiring less decoding time and offering human-friendly compression. Through empirical evaluation, we demonstrate the effectiveness of our proposed method in achieving outstanding performance, with more than 25% bit-rate saving with comparable perceptual quality.

  • Research Article
  • 10.1109/tcsvt.2024.3522621
Sparse Point Clouds Assisted Learned Image Compression
  • May 1, 2025
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Yiheng Jiang + 4 more

In the field of autonomous driving, a variety of sensor data types exist, each representing different modalities of the same scene. Therefore, it is feasible to utilize data from other sensors to facilitate image compression. However, few techniques have explored the potential benefits of utilizing inter-modality correlations to enhance the image compression performance. In this paper, motivated by the recent success of learned image compression, we propose a new framework that uses sparse point clouds to assist in learned image compression in the autonomous driving scenario. We first project the 3D sparse point cloud onto a 2D plane, resulting in a sparse depth map. Utilizing this depth map, we proceed to predict camera images. Subsequently, we use these predicted images to extract multi-scale structural features. These features are then incorporated into learned image compression pipeline as additional information to improve the compression performance. Our proposed framework is compatible with various mainstream learned image compression models, and we validate our approach using different existing image compression methods. The experimental results show that incorporating point cloud assistance into the compression pipeline consistently enhances the performance.

  • Conference Article
  • Cite Count Icon 112
  • 10.1109/cvpr.2019.01031
Learning Image and Video Compression Through Spatial-Temporal Energy Compaction
  • Jun 1, 2019
  • Zhengxue Cheng + 3 more

Compression has been an important research topic for many decades, to produce a significant impact on data transmission and storage. Recent advances have shown a great potential of learning image and video compression. Inspired from related works, in this paper, we present an image compression architecture using a convolutional autoencoder, and then generalize image compression to video compression, by adding an interpolation loop into both encoder and decoder sides. Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. Thereby, we propose to add a spatial energy compaction-based penalty into loss function, to achieve higher image compression performance. Furthermore, based on temporal energy distribution, we propose to select the number of frames in one interpolation loop, adapting to the motion characteristics of video contents. Experimental results demonstrate that our proposed image compression outperforms the latest image compression standard with MS-SSIM quality metric, and provides higher performance compared with state-of-the-art learning compression methods at high bit rates, which benefits from our spatial energy compaction approach. Meanwhile, our proposed video compression approach with temporal energy compaction can significantly outperform MPEG-4 and is competitive with commonly used H.264. Both our image and video compression can produce more visually pleasant results than traditional standards.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 20
  • 10.3390/rs15082211
Remote Sensing Image Compression Based on the Multiple Prior Information
  • Apr 21, 2023
  • Remote Sensing
  • Chuan Fu + 1 more

Learned image compression has achieved a series of breakthroughs for nature images, but there is little literature focusing on high-resolution remote sensing image (HRRSI) datasets. This paper focuses on designing a learned lossy image compression framework for compressing HRRSIs. Considering the local and non-local redundancy contained in HRRSI, a mixed hyperprior network is designed to explore both the local and non-local redundancy in order to improve the accuracy of entropy estimation. In detail, a transformer-based hyperprior and a CNN-based hyperprior are fused for entropy estimation. Furthermore, to reduce the mismatch between training and testing, a three-stage training strategy is introduced to refine the network. In this training strategy, the entire network is first trained, and then some sub-networks are fixed while the others are trained. To evaluate the effectiveness of the proposed compression algorithm, the experiments are conducted on an HRRSI dataset. The results show that the proposed algorithm achieves comparable or better compression performance than some traditional and learned image compression algorithms, such as Joint Photographic Experts Group (JPEG) and JPEG2000. At a similar or lower bitrate, the proposed algorithm is about 2 dB higher than the PSNR value of JPEG2000.

  • Book Chapter
  • Cite Count Icon 15
  • 10.1007/978-3-031-19800-7_37
Content-Oriented Learned Image Compression
  • Jan 1, 2022
  • Meng Li + 4 more

In recent years, with the development of deep neural networks, end-to-end optimized image compression has made significant progress and exceeded the classic methods in terms of rate-distortion performance. However, most learning-based image compression methods are unlabeled and do not consider image semantics or content when optimizing the model. In fact, human eyes have different sensitivities to different content, so the image content also needs to be considered. In this paper, we propose a content-oriented image compression method, which handles different kinds of image contents with different strategies. Extensive experiments show that the proposed method achieves competitive subjective results compared with state-of-the-art end-to-end learned image compression methods or classic methods.KeywordsImage compressionContent-orientedLoss metric

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-030-88010-1_16
An Enhanced Multi-frequency Learned Image Compression Method
  • Jan 1, 2021
  • Lin He + 3 more

Learned image compression methods have represented the potential to outperform the traditional image compression methods in recent times. However, current learned image compression methods utilize the same spatial resolution for latent variables, which contains some redundancies. By representing different frequency latent variables with different spatial resolutions, the spatial redundancy is reduced, which improves the R-D performance. Based on the recently introduced generalized octave convolutions, which factorize latent variables into different frequency components, an enhanced multi-frequency learned image compression method is introduced. In this paper, we incorporate the channel attention module into multi-frequency learned image compression network to improve the performance of adaptive code word assignment. By using the attention module to capture the global correlation of latent variables, complex parts of the image such as textures and boundaries can be better reconstructed. Besides, an enhancement module on decoder side is utilized to generate gains. Our method shows the great visual appearance and achieves a better grade on the MS-SSIM distortion metrics at low bit rates than other standard codecs and learning-based image compression methods.

  • Research Article
  • 10.1145/3803542
Learned Image Compression with Frequency Feature Interaction and Non-local Cross-similarity Prior
  • Mar 25, 2026
  • ACM Transactions on Multimedia Computing, Communications, and Applications
  • Jian Wang + 1 more

Recently learned image compression models have achieved better compression performance than traditional non-learning image compression standards. Those learned models usually utilize spatial self-attention and CNN to extract non-local and local features and generate the latent representation. However, previous methods adopt a linear layer to fuse non-local and local features and lack the flexibility to adaptively adjust feature weights and capture complex non-linear interactions between distinct feature representations. Additionally, how to more effectively compress the latent representation based on its channel similarity characteristics remains unexplored. To solve the above issues, we propose a novel image compression method with frequency feature interaction and non-local cross-similarity prior. More specifically, we extend the previous spatial self-attention module and alternately use spatial and channel self-attention modules to extract non-local spatial and channel features, respectively, and depth-wise convolution is utilized to extract local features. As local features focus on high-frequency detail information and non-local features concentrate on low-frequency structural information, we propose a frequency interaction module (FIM) that generates two weight maps to dynamically fuse non-local and local features. Moreover, we observe the non-local cross-similarity in different channels of the latent representation, which indicates that different channels share similar non-local semantic and structural information, but have distinct local detail information. So we design a dual transformer entropy model to emphasize non-local features and remove local features. Experiment results validate our method achieves promising compression performance on the Kodak, CLIC and Tecnick datasets.

  • Conference Article
  • Cite Count Icon 9
  • 10.1109/pcs50896.2021.9477479
A Practical Approach for Rate-Distortion-Perception Analysis in Learned Image Compression
  • Jun 1, 2021
  • Ogun Kirmemis + 1 more

Rate-distortion optimization (RDO) of codecs, where distortion is quantified by the mean-square error, has been a standard practice in image/video compression over the years. RDO serves well for optimization of codec performance for evaluation of the results in terms of PSNR. However, it is well known that the PSNR does not correlate well with perceptual evaluation of images; hence, RDO is not well suited for perceptual optimization of codecs. Recently, rate-distortion-perception trade-off has been formalized by taking the Kullback-Leibler (KL) divergence between the distributions of the original and reconstructed images as a perception measure. Learned image compression methods that simultaneously optimize rate, mean-square loss, VGG loss, and an adversarial loss were proposed. Yet, there exists no easy approach to fix the rate, distortion or perception at a desired level in a practical learned image compression solution to perform an analysis of the trade-off between rate, distortion and perception measures. In this paper, we propose a practical approach to fix the rate to carry out perception-distortion analysis at a fixed rate in order to perform perceptual evaluation of image compression results in a principled manner. Experimental results provide several insights for practical rate-distortion-perception analysis in learned image compression.

  • Research Article
  • 10.1109/jetcas.2025.3538652
Learned Image Compression With Efficient Cross-Platform Entropy Coding
  • Jan 1, 2025
  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • Runyu Yang + 3 more

Learned image compression has shown remarkable compression efficiency gain over the traditional image compression solutions, which is partially attributed to the learned entropy models and the adopted entropy coding engine. However, the inference of the entropy models and the sequential nature of the entropy coding both incur high time complexity. Meanwhile, the neural network-based entropy models usually involve floating-point computations, which incur inconsistent probability estimation and decoding failure in different platforms. We address these limitations by introducing an efficient and cross-platform entropy coding method, chain coding-based latent compression (CC-LC), into learned image compression. First, we leverage the classic chain coding and carefully design a block-based entropy coding procedure, significantly reducing the number of coding symbols and thus the coding time. Second, since CC-LC is not based on neural networks, we propose a rate estimation network as a surrogate of CC-LC during the end-to-end training. Third, we alternately train the analysis/synthesis networks and the rate estimation network for the rate-distortion optimization, making the learned latent fit CC-LC. Experimental results show that our method achieves much lower time complexity than the other learned image compression methods, ensures cross-platform consistency, and has comparable compression efficiency with BPG. Our code and models are publicly available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/Yang-Runyu/CC-LC</uri>.

Save Icon
Up Arrow
Open/Close
Setting-up Chat
Loading Interface