Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Recent Advances in End-to-End Learned Image and Video Compression

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.

Similar Papers
  • Conference Article
  • Cite Count Icon 112
  • 10.1109/cvpr.2019.01031
Learning Image and Video Compression Through Spatial-Temporal Energy Compaction
  • Jun 1, 2019
  • Zhengxue Cheng + 3 more

Compression has been an important research topic for many decades, to produce a significant impact on data transmission and storage. Recent advances have shown a great potential of learning image and video compression. Inspired from related works, in this paper, we present an image compression architecture using a convolutional autoencoder, and then generalize image compression to video compression, by adding an interpolation loop into both encoder and decoder sides. Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. Thereby, we propose to add a spatial energy compaction-based penalty into loss function, to achieve higher image compression performance. Furthermore, based on temporal energy distribution, we propose to select the number of frames in one interpolation loop, adapting to the motion characteristics of video contents. Experimental results demonstrate that our proposed image compression outperforms the latest image compression standard with MS-SSIM quality metric, and provides higher performance compared with state-of-the-art learning compression methods at high bit rates, which benefits from our spatial energy compaction approach. Meanwhile, our proposed video compression approach with temporal energy compaction can significantly outperform MPEG-4 and is competitive with commonly used H.264. Both our image and video compression can produce more visually pleasant results than traditional standards.

  • Research Article
  • Cite Count Icon 4
  • 10.1142/s0129156497000056
Image and Video Compression: A Review
  • Mar 1, 1997
  • International Journal of High Speed Electronics and Systems
  • Christine I Podilchuk + 1 more

The area of image and video compression has made tremendous progress over the last several decades. The successes in image compression are due to advances and better understanding of waveform coding methods which take advantage of the signal statistics, perceptual methods which take advantage of psychovisual properties of the human visual system (HVS) and object-based models especially for very low bit rate work. Recent years have produced several image coding standards—JPEG for still image compression and H.261, MPEG-I and MPEG-II for video compression. While we have devoted a special section in this paper to cover international coding standards because of their practical value, we have also covered a large class of nonstandard coding technology in the interest of completeness and potential future value. Very low bit rate video coding remains a challenging problem as does our understanding of the human visual system for perceptually optimum compression. The wide range of applications and bit rates, from video telephony at rates as low as 9.6 kbps to HDTV at 20 Mbps and higher, has acted as a catalyst for generating new ideas in tackling the different challenges characterized by the particular application. The area of image compression will remain an interesting and fruitful area of research as we focus on combining source coding with channel coding and multimedia networking.

  • Research Article
  • Cite Count Icon 2
  • 10.33103/uot.ijccce.23.1.11
Review on Fractal Video and Image Compression Techniques
  • Mar 30, 2023
  • Iraqi Journal of Computer, Communication, Control and System Engineering
  • Baydaa Z Sh + 3 more

Compression of images process is a step in the image processing. It is concerned with the transmission and storage of digitally created images. Fractal coding is a potential image and video compression approach with good reconstruction fidelity and relatively large compression ratios, because of its simplicity and great performance; fractal image compression (FIC) is a particularly popular approach in image compression applications. However, it has a significant disadvantage in the form of a long encoding time. This is because encoding any small bit necessitates a massive similarity search in the original data As a result; the FIC search time is reduced while the quality of the reconstructed images is maintained acceptable level in many introduced paper and other still a study topic in progress. Fractal images are images that are self-similar in that each individual part is the same as the total. This paper will discusses many attempts for more author that working on image and video compression using fractal compression technique based on various approach and with each discuss focuses on the main parameter of compression such compression ratio (CR), peak signal to noise ratio (PSNR) and encoding time, as well as the details of data set the used for testing also writing with each technique to creating fractal video and image compression. Index Terms— FIC, Video compression, parallel processing, Iteration Function System (IFS), Image processing.

  • Book Chapter
  • 10.5772/9301
LSCIC Pre Coder for Image and Video Compression
  • Mar 1, 2010
  • Muhammad Kamran + 2 more

Image and video compression schemes are implemented for the optimum reconstruction of image with respect to speed and quality. LSCIC (Layered Scalable Concurrent Image Compression) pre coder is introduced here to utilize best available resources to obtain reasonable good image or video even at low band width of the system. This pre coder will make the layers of input data whether video or image and after synchronization send it to the output of pre coder on two different layers at the same time. Prior to understand image compression issue it is more important to become familiar with different image standard formats under usage for certain application. Mainly they include JPEG, GIF, and TIFF etc. Image compression scenario is the main entity to be included in the dissertation as per our project requirement. A new idea for scalable concurrent image compression is introduced which gives superior image reconstruction performance as compare to existing techniques. The verification can be done by calculating gray level and PSNR of reconstructed image. The bit stream is required to be compressed for image data transfer if the main system requirement is the memory saving and fast transformation with little sacrifice in the quality of image for lossy compression scheme. A valuable study is accomplished by K Shen, 1997 for parallel implementation of image and video compression. It is suggested that an ideal algorithm should have a low compressed data rate, high visual quality of the decoded image/video and low computational complexity. In hardware approaches special parallel architectures can be design to accelerate computation suggested by R. J. Gove(1994) and Shinji Komori (1988) et al. Parallel video compression algorithms can be implemented using either hardware or software approaches as proved by V. Bhaskaran (1995). These techniques provided the guidelines to deal with digital image compression schemes fro speed and complexity point of view. For video compression, motion estimation phenomenan has its own importance and different techniques are already presented to have motion estimation to get good quality image. Decoding is considered as first step of compression followed by encoding at receiving end of image and reconstruction side. Intermediate step in data/image and video compression is the transform. Different transform techniques have been used depending upon application. 20

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icip40778.2020.9190974
Shrinkage as Activation for Learned Image Compression
  • Oct 1, 2020
  • Ogun Kirmemis + 1 more

With recent advances in learned entropy and context models, the rate-distortion performance of deep learned image compression methods reached or surpassed those of conventional codecs. However, learned image compression is currently more complex and slower than conventional image compression. Learned image and video compression methods almost exclusively employ the generalized divisive normalization (GDN) activation function. This paper investigates the effect of activation function on the performance of image compression in terms of both objective and subjective criteria as well as runtime. In particular, we show that the distribution of latents produced by hard shrinkage fits a Laplacian better, and it is possible to achieve similar rate-distortion and better visual performance using hard shrinkage with lower complexity.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 20
  • 10.3390/rs15082211
Remote Sensing Image Compression Based on the Multiple Prior Information
  • Apr 21, 2023
  • Remote Sensing
  • Chuan Fu + 1 more

Learned image compression has achieved a series of breakthroughs for nature images, but there is little literature focusing on high-resolution remote sensing image (HRRSI) datasets. This paper focuses on designing a learned lossy image compression framework for compressing HRRSIs. Considering the local and non-local redundancy contained in HRRSI, a mixed hyperprior network is designed to explore both the local and non-local redundancy in order to improve the accuracy of entropy estimation. In detail, a transformer-based hyperprior and a CNN-based hyperprior are fused for entropy estimation. Furthermore, to reduce the mismatch between training and testing, a three-stage training strategy is introduced to refine the network. In this training strategy, the entire network is first trained, and then some sub-networks are fixed while the others are trained. To evaluate the effectiveness of the proposed compression algorithm, the experiments are conducted on an HRRSI dataset. The results show that the proposed algorithm achieves comparable or better compression performance than some traditional and learned image compression algorithms, such as Joint Photographic Experts Group (JPEG) and JPEG2000. At a similar or lower bitrate, the proposed algorithm is about 2 dB higher than the PSNR value of JPEG2000.

  • Research Article
  • Cite Count Icon 16
  • 10.1109/tcsvt.2022.3229701
Learned Progressive Image Compression With Dead-Zone Quantizers
  • Jun 1, 2023
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Shaohui Li + 5 more

Progressive coding is essential to the practical deployment of learned image compression over heterogeneous networks and clients. Existing methods for learned progressive image compression require complex and empirical design to achieve near-optimal rate-distortion performance over a wide range of bit-rates. However, these methods are limited by the implicit learned mechanism based on neural networks and introduction of uniform quantizers. In this paper, we propose generalized learned progressive image compression with analytic rate-distortion optimization using dead-zone quantizers on the latent representation. Specifically, we reveal that dead-zone quantizers, as a general case of uniform quantizers, are equivalent to uniform quantizers in fixed-rate nonlinear transform coding and can prevent extra redundancy in embedded quantization for progressive coding. Consequently, we propose rate-distortion optimized learned progressive coding by approximating the optimal quantizer in the source spaces using dead-zone quantizers in an analytic manner on the Laplacian source. To our best knowledge, this paper is the first to achieve general learned progressive coding from the perspective of optimal quantizers. The proposed method achieves theoretically sound and practically efficient embedded quantization and learned progressive coding of latent representations with improved rate-distortion performance. It can also enable embedded quantization with diverse assignments of truncation points and support flexible configuration of quality layers of varying numbers and at varying target bit-rates. Furthermore, we successfully incorporate the proposed method into existing pre-trained fixed-rate models to realize progressive learned image compression without re-training. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performance in learned progressive image compression compared with traditional codecs and recent learned methods.

  • Supplementary Content
  • 10.6092/polito/porto/2671060
Design and Optimization of Graph Transform for Image and Video Compression
  • Jan 1, 2017
  • Politecnico di Torino
  • Giulia Fracastoro

The main contribution of this thesis is the introduction of new methods for designing adaptive transforms for image and video compression. Exploiting graph signal processing techniques, we develop new graph construction methods targeted for image and video compression applications. In this way, we obtain a graph that is, at the same time, a good representation of the image and easy to transmit to the decoder. To do so, we investigate different research directions. First, we propose a new method for graph construction that employs innovative edge metrics, quantization and edge prediction techniques. Then, we propose to use a graph learning approach and we introduce a new graph learning algorithm targeted for image compression that defines the connectivities between pixels by taking into consideration the coding of the image signal and the graph topology in rate-distortion term. Moreover, we also present a new superpixel-driven graph transform that uses clusters of superpixel as coding blocks and then computes the graph transform inside each region. In the second part of this work, we exploit graphs to design directional transforms. In fact, an efficient representation of the image directional information is extremely important in order to obtain high performance image and video coding. In this thesis, we present a new directional transform, called Steerable Discrete Cosine Transform (SDCT). This new transform can be obtained by steering the 2D-DCT basis in any chosen direction. Moreover, we can also use more complex steering patterns than a single pure rotation. In order to show the advantages of the SDCT, we present a few image and video compression methods based on this new directional transform. The obtained results show that the SDCT can be efficiently applied to image and video compression and it outperforms the classical DCT and other directional transforms. Along the same lines, we present also a new generalization of the DFT, called Steerable DFT (SDFT). Differently from the SDCT, the SDFT can be defined in one or two dimensions. The 1D-SDFT represents a rotation in the complex plane, instead the 2D-SDFT performs a rotation in the 2D Euclidean space.

  • Research Article
  • 10.31673/2412-9070.2020.043237
Artificial neural network applications for data compression in video data transfer protocols
  • Jan 1, 2020
  • Connectivity
  • G Ya Kis + 1 more

The article describes the current state of data transfer protocols and methods of image and video compression through the use of artificial neural networks, namely convolutional multilayer networks and deep structured learning. Based on recent publications, a comparative analysis of the performance of classical compression methods and methods based on neural networks was performed. The most effective are those compression methods which are based on decorrelation transforms, namely discrete cosine (JPEG standard) and wavelet (JPEG-2000 standard) transforms. The transform coefficients have a well-understood physical content of spatial frequencies and can be further quantized for a more optimal representation of components that are less important for human perception. The HEVC standard guarantees a more efficient image compression scheme that further takes advantage of the similarity of adjacent blocks and uses interpolation (intracoding). Based on the HEVC standard, the BPG (better portable graphics) format was developed to be used on the Internet as an alternative to JPEG, which is much more efficient than other standards. An overview of the current state of open standards, provided in the article, gives an explanation of what properties of neural networks can be applied to image compression. There are two approaches towards the compression using neural networks: in case of the first approach neural network is used as a part of an existing algorithm (hybrid coding), and in case of the second approach the neural network gives a concise representation of the data (compression network). The final conclusions were made as regards to the application of these algorithms in H.265 protocol (HEVC) and the possibility of creating a new protocol which is completely based on the neural network. Protocols using neural network show better results during image compression, but are currently hard to be subjected to standardization in order to obtain the expected result in case of different network architects. We may expect and predict an increase in the need for video transmission in the future, which will bump into the imitating nature of classical approaches. At the same time, the development of specialized processors for parallel data processing and implementation of neural networks is currently underway. These two factors indicate that neural networks must be embedded into the industrial data standards.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-031-31417-9_33
Novel Image and Its Compressed Image Based on VVC Standard, Pair Data Set for Deep Learning Image and Video Compression Applications
  • Jan 1, 2023
  • Rohan Lal + 2 more

More than 80 percent of online traffic is video and image traffic and this will likely rise in the upcoming years. Images and video have multiple dimensions to grow data rate via increasing frame resolution, frame depth, multi-view representation etc. Thus it is very crucial to compress these images and videos efficiently. Lack of sufficient experimental data is a major setback for the development of image and video compression based on deep learning models. This study presents a new kind of data set for the research community with the goal of advancing the state-of-the-art in image compression using deep learning models. The proposed data set consists of the image and its corresponding VVC (Versatile Video Coding) standard based compressed image as a label of the input image for two quantization parameters. Images from different states of Indian subcontinent area has been captured, containing common objects in their natural context, the beautiful campus of Indian Institute of Technology Madras, which is blessed with rich flora and fauna, and is home to several rare wildlife species, scenes from Himalayas, Clouds in Cherrapunji, Indoor scenes etc. has been captured. The data set will be made publicly to the research community. Statistical analysis of the data set is presented along with VVC compression standard coding analysis.

  • Research Article
  • Cite Count Icon 66
  • 10.1109/tcsvt.2021.3119660
Learned Block-Based Hybrid Image Compression
  • Jun 1, 2022
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Yaojun Wu + 4 more

Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications. First, parallel acceleration of the autoregressive entropy model cannot be achieved due to serial decoding. Second, full-resolution inference often causes the out-of-memory (OOM) problem with limited GPU resources, especially for high-resolution images. Block partition is a good choice to handle the above issues, but it brings about new challenges in reducing the redundancy between blocks and eliminating block effects. To tackle the above challenges, this paper provides a learned block-based hybrid image compression (LBHIC) framework. Specifically, we introduce explicit intra prediction into a learned image compression framework to utilize the relation among adjacent blocks. Superior to context modeling by linear weighting of neighbor pixels in traditional codecs, we propose a contextual prediction module (CPM) to better capture long-range correlations by utilizing the strip pooling to extract the most relevant information in neighboring latent space, thus achieving effective information prediction. Moreover, to alleviate blocking artifacts, we further propose a boundary-aware postprocessing module (BPM) with the edge importance taken into account. Extensive experiments demonstrate that the proposed LBHIC codec outperforms the VVC, with a bit-rate conservation of 4.1%, and reduces the decoding time by approximately 86.7% compared with that of state-of-the-art learned image compression methods.

  • Dissertation
  • 10.33915/etd.13084
Neural Network-based Image Compression
  • Jan 1, 2025
  • Atefeh Khoshkhahtinat

The rapid advancement of information technology and the exponential growth of digital communication have significantly increased the demand for efficient data compression techniques that reduce storage requirements, minimize bandwidth consumption, and accelerate data transmission—without substantially compromising data quality. This dissertation addresses these challenges by investigating and developing advanced learned image compression (LIC) methods, with a particular focus on lossy compression for both natural images and scientific imagery obtained from NASA’s Solar Dynamics Observatory (SDO) mission. Traditional image compression standards—such as JPEG, JPEG2000, BPG, and HEVC—rely on manually engineered transforms and heuristic rules, which often lack the adaptability required to accommodate diverse visual content and application-specific constraints. In contrast, learned image compression employs deep neural networks trained in an end-to-end manner, guided by principles from rate–distortion theory, to optimize the trade-off between compression efficiency and reconstruction fidelity. In the first part of this dissertation, several technical challenges in developing neural image compression codecs for natural images (general-purpose) are addressed, including the design of expressive nonlinear transforms, accurate entropy modeling, and the integration of perceptually meaningful loss functions. To this end, several learned image compression frameworks are proposed, each introducing distinct design innovations: a Transformer-based nonlinear transform that captures both local and global dependencies, an advanced entropy model that improves probability estimation and coding efficiency, and a conditional diffusion-based generative framework that enhances the perceptual quality of reconstructed images. The second part focuses on the application of learned compression to imagery from NASA’s Solar Dynamics Observatory (SDO) mission. A learned video compression framework is developed to exploit both spatial and temporal redundancies in solar image sequences. Furthermore, an adaptive compression strategy is introduced to prioritize scientific relevance: images containing solar flare events are compressed at lower ratios to preserve critical information, whereas non-flare images are compressed more aggressively to maximize storage and transmission efficiency. Collectively, these contributions advance the field of learned image compression across both general-purpose and scientific imaging domains, providing practical solutions for improving data transmission and storage efficiency in real-world and mission-critical environments.

  • Conference Article
  • Cite Count Icon 12
  • 10.1109/vcip49819.2020.9301828
Learned image and video compression with deep neural networks
  • Dec 1, 2020
  • Dong Xu + 3 more

This tutorial aims at reviewing the recent progress in the deep learning based data compression, including image compression and video compression. In the past years, deep learning techniques have been successfully applied to a large number of computer vision and image processing tasks. However, for the data compression task, the traditional approaches (i.e., block based motion estimation and motion compensation, etc.) are still widely employed in the mainstream codecs. Considering the powerful representation capability, it is possible to improve the data compression performance by employing the advanced deep learning technologies. To this end, deep leaning based compression approaches have recently received significant attention from both academia and industry in the field of computer vision and image/video compression. In this tutorial, we will introduce the related deep learning techniques for image compression and video compression. Specifically, in this tutorial, we will first introduce the basic pipeline for the traditional codecs, such as JPEG, H.264 and HEVC. Then, we will discuss the common network architectures for visual data compression and analyse different learning based entropy models. Based on these techniques, we will describe several widely used end-to-end optimized frameworks for visual data compression. In summary, our tutorial will cover both the traditional data coding techniques and the popular learning based visual data compression algorithms, which will help the audiences with different backgrounds learn the recent progresses in this emerging research area.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/cccai59026.2023.00041
HFLIC: Human Friendly Perceptual Learned Image Compression with Reinforced Transform
  • Jun 1, 2023
  • Peirong Ning + 2 more

In recent years, there has been rapid development in learned image compression techniques that prioritize rate-distortion-perceptual compression, preserving fine details even at lower bit-rates. However, current learning-based image compression methods often sacrifice human-friendly compression and require long decoding times. In this paper, we propose enhancements to the backbone network and loss function of existing image compression model, focusing on improving human perception and efficiency. Our proposed approach achieves competitive subjective results compared to state-of-the-art end-to-end learned image compression methods and classic methods, while requiring less decoding time and offering human-friendly compression. Through empirical evaluation, we demonstrate the effectiveness of our proposed method in achieving outstanding performance, with more than 25% bit-rate saving with comparable perceptual quality.

  • Research Article
  • 10.1109/tcsvt.2024.3522621
Sparse Point Clouds Assisted Learned Image Compression
  • May 1, 2025
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Yiheng Jiang + 4 more

In the field of autonomous driving, a variety of sensor data types exist, each representing different modalities of the same scene. Therefore, it is feasible to utilize data from other sensors to facilitate image compression. However, few techniques have explored the potential benefits of utilizing inter-modality correlations to enhance the image compression performance. In this paper, motivated by the recent success of learned image compression, we propose a new framework that uses sparse point clouds to assist in learned image compression in the autonomous driving scenario. We first project the 3D sparse point cloud onto a 2D plane, resulting in a sparse depth map. Utilizing this depth map, we proceed to predict camera images. Subsequently, we use these predicted images to extract multi-scale structural features. These features are then incorporated into learned image compression pipeline as additional information to improve the compression performance. Our proposed framework is compatible with various mainstream learned image compression models, and we validate our approach using different existing image compression methods. The experimental results show that incorporating point cloud assistance into the compression pipeline consistently enhances the performance.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant