Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Shrinkage as Activation for Learned Image Compression

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

With recent advances in learned entropy and context models, the rate-distortion performance of deep learned image compression methods reached or surpassed those of conventional codecs. However, learned image compression is currently more complex and slower than conventional image compression. Learned image and video compression methods almost exclusively employ the generalized divisive normalization (GDN) activation function. This paper investigates the effect of activation function on the performance of image compression in terms of both objective and subjective criteria as well as runtime. In particular, we show that the distribution of latents produced by hard shrinkage fits a Laplacian better, and it is possible to achieve similar rate-distortion and better visual performance using hard shrinkage with lower complexity.

Similar Papers
  • Research Article
  • Cite Count Icon 16
  • 10.1109/tcsvt.2022.3229701
Learned Progressive Image Compression With Dead-Zone Quantizers
  • Jun 1, 2023
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Shaohui Li + 5 more

Progressive coding is essential to the practical deployment of learned image compression over heterogeneous networks and clients. Existing methods for learned progressive image compression require complex and empirical design to achieve near-optimal rate-distortion performance over a wide range of bit-rates. However, these methods are limited by the implicit learned mechanism based on neural networks and introduction of uniform quantizers. In this paper, we propose generalized learned progressive image compression with analytic rate-distortion optimization using dead-zone quantizers on the latent representation. Specifically, we reveal that dead-zone quantizers, as a general case of uniform quantizers, are equivalent to uniform quantizers in fixed-rate nonlinear transform coding and can prevent extra redundancy in embedded quantization for progressive coding. Consequently, we propose rate-distortion optimized learned progressive coding by approximating the optimal quantizer in the source spaces using dead-zone quantizers in an analytic manner on the Laplacian source. To our best knowledge, this paper is the first to achieve general learned progressive coding from the perspective of optimal quantizers. The proposed method achieves theoretically sound and practically efficient embedded quantization and learned progressive coding of latent representations with improved rate-distortion performance. It can also enable embedded quantization with diverse assignments of truncation points and support flexible configuration of quality layers of varying numbers and at varying target bit-rates. Furthermore, we successfully incorporate the proposed method into existing pre-trained fixed-rate models to realize progressive learned image compression without re-training. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performance in learned progressive image compression compared with traditional codecs and recent learned methods.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icoin50884.2021.9333956
Learned Image Compression with Frequency Domain Loss
  • Jan 13, 2021
  • Soonbin Lee + 3 more

This paper proposes an end-to-end deep image compression model with a frequency domain loss function. Unlike previous deep image compression methods, the model is computed jointly in the frequency domain. By calculating in the frequency domain, the model incorporates high-frequency components to capture detailed information in the reconstructed images effectively. The process of frequency domain relates to the compression technologies, a concept universal to modern image/video codecs (e.g., JPEG), but it has seldom been investigated in a deep image compression model based on neural networks. It was demonstrated that this model shows better image compression performance when measuring visual quality using the peak signal-to-noise ratio, and its rate-distortion performance outperformed traditional neural-network-based models when the model was trained jointly in the frequency domain. This model improves the performance of image compression, especially when the bitrate was low. Moreover, the method can be used and applicable to other compression models easily.

  • Conference Article
  • Cite Count Icon 112
  • 10.1109/cvpr.2019.01031
Learning Image and Video Compression Through Spatial-Temporal Energy Compaction
  • Jun 1, 2019
  • Zhengxue Cheng + 3 more

Compression has been an important research topic for many decades, to produce a significant impact on data transmission and storage. Recent advances have shown a great potential of learning image and video compression. Inspired from related works, in this paper, we present an image compression architecture using a convolutional autoencoder, and then generalize image compression to video compression, by adding an interpolation loop into both encoder and decoder sides. Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. Thereby, we propose to add a spatial energy compaction-based penalty into loss function, to achieve higher image compression performance. Furthermore, based on temporal energy distribution, we propose to select the number of frames in one interpolation loop, adapting to the motion characteristics of video contents. Experimental results demonstrate that our proposed image compression outperforms the latest image compression standard with MS-SSIM quality metric, and provides higher performance compared with state-of-the-art learning compression methods at high bit rates, which benefits from our spatial energy compaction approach. Meanwhile, our proposed video compression approach with temporal energy compaction can significantly outperform MPEG-4 and is competitive with commonly used H.264. Both our image and video compression can produce more visually pleasant results than traditional standards.

  • Research Article
  • 10.1109/tmm.2026.3651136
Learned Image Compression Via Local-to-Global Cross-Component Prior
  • Jan 1, 2026
  • IEEE Transactions on Multimedia
  • Wenhong Duan + 6 more

Learned image compression (LIC) methods have shown promising results and achieved superior performance compared to traditional image compression methods. Due to the neglect of the utilization of cross-component correlations, there is still a potential for further performance improvement. In this paper, we first explore the inter-channel correlations of different color spaces and transform the image compression problem in RGB color space into that in YUV color space, which has cross-component prior information. We propose a novel image compression method that leverages local-to-global cross-component prior modeling, utilizing a cross-component attention mechanism to improve coding performance. First, we design the cross-component prior gate (CPG) to model the cross-component prior information based on attention mechanism. Inspired by common knowledge in data compression, luma component (Y) contains more details and textural/structural information compared to chroma components (UV). The proposed method can make full use of the cross-component guidance information from luma to chroma components to achieve effective image compression. Experimental results demonstrate that the proposed method can achieve superior performance compared to existing learned image compression methods. The proposed method can achieve 9.20% rate savings compared to the image compression standard Versatile Video Coding (VVC) Test Model (VTM-11.0) on Kodak dataset.

  • Research Article
  • Cite Count Icon 20
  • 10.1109/tip.2023.3319275
Learned Image Compression Using Cross-Component Attention Mechanism.
  • Jan 1, 2023
  • IEEE Transactions on Image Processing
  • Wenhong Duan + 6 more

Learned image compression methods have achieved satisfactory results in recent years. However, existing methods are typically designed for RGB format, which are not suitable for YUV420 format due to the variance of different formats. In this paper, we propose an information-guided compression framework using cross-component attention mechanism, which can achieve efficient image compression in YUV420 format. Specifically, we design a dual-branch advanced information-preserving module (AIPM) based on the information-guided unit (IGU) and attention mechanism. On the one hand, the dual-branch architecture can prevent changes in original data distribution and avoid information disturbance between different components. The feature attention block (FAB) can preserve the important information. On the other hand, IGU can efficiently utilize the correlations between Y and UV components, which can further preserve the information of UV by the guidance of Y. Furthermore, we design an adaptive cross-channel enhancement module (ACEM) to reconstruct the details by utilizing the relations from different components, which makes use of the reconstructed Y as the textural and structural guidance for UV components. Extensive experiments show that the proposed framework can achieve the state-of-the-art performance in image compression for YUV420 format. More importantly, the proposed framework outperforms Versatile Video Coding (VVC) with 8.37% BD-rate reduction on common test conditions (CTC) sequences on average. In addition, we propose a quantization scheme for context model without model retraining, which can overcome the cross-platform decoding error caused by the floating-point operations in context model and provide a reference approach for the application of neural codec on different platforms.

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.neucom.2022.07.065
Successive learned image compression: Comprehensive analysis of instability
  • Jul 22, 2022
  • Neurocomputing
  • Jun-Hyuk Kim + 3 more

Successive learned image compression: Comprehensive analysis of instability

  • Research Article
  • 10.1109/tcsvt.2024.3522621
Sparse Point Clouds Assisted Learned Image Compression
  • May 1, 2025
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Yiheng Jiang + 4 more

In the field of autonomous driving, a variety of sensor data types exist, each representing different modalities of the same scene. Therefore, it is feasible to utilize data from other sensors to facilitate image compression. However, few techniques have explored the potential benefits of utilizing inter-modality correlations to enhance the image compression performance. In this paper, motivated by the recent success of learned image compression, we propose a new framework that uses sparse point clouds to assist in learned image compression in the autonomous driving scenario. We first project the 3D sparse point cloud onto a 2D plane, resulting in a sparse depth map. Utilizing this depth map, we proceed to predict camera images. Subsequently, we use these predicted images to extract multi-scale structural features. These features are then incorporated into learned image compression pipeline as additional information to improve the compression performance. Our proposed framework is compatible with various mainstream learned image compression models, and we validate our approach using different existing image compression methods. The experimental results show that incorporating point cloud assistance into the compression pipeline consistently enhances the performance.

  • Research Article
  • Cite Count Icon 16
  • 10.1109/tbc.2024.3464413
JND-LIC: Learned Image Compression via Just Noticeable Difference for Human Visual Perception
  • Mar 1, 2025
  • IEEE Transactions on Broadcasting
  • Zhaoqing Pan + 6 more

Existing human visual perception-oriented image compression methods well maintain the perceptual quality of compressed images, but they may introduce fake details into the compressed images, and cannot dynamically improve the perceptual rate-distortion performance at the pixel level. To address these issues, a just noticeable difference (JND)-based learned image compression (JND-LIC) method is proposed for human visual perception in this paper, in which a weight-shared model is used to extract image features and JND features, and the learned JND features are utilized as perceptual prior knowledge to assist the image coding process. In order to generate a highly compact image feature representation, a JND-based feature transform module is proposed to model the pixel-to-pixel masking correlation between the image features and the JND features. Furthermore, inspired by eye movement research that the human visual system perceives image degradation unevenly, a JND-guided quantization mechanism is proposed for the entropy coding, which adjusts the quantization step of each pixel to further eliminate perceptual redundancies. Extensive experimental results show that our proposed JND-LIC significantly improves the perceptual quality of compressed images with fewer coding bits compared to state-of-the-art learned image compression methods. Additionally, the proposed method can be flexibly integrated with various advanced learned image compression methods, and has robust generalization capabilities to improve the efficiency of perceptual coding.

  • Research Article
  • Cite Count Icon 66
  • 10.1109/tcsvt.2021.3119660
Learned Block-Based Hybrid Image Compression
  • Jun 1, 2022
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Yaojun Wu + 4 more

Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications. First, parallel acceleration of the autoregressive entropy model cannot be achieved due to serial decoding. Second, full-resolution inference often causes the out-of-memory (OOM) problem with limited GPU resources, especially for high-resolution images. Block partition is a good choice to handle the above issues, but it brings about new challenges in reducing the redundancy between blocks and eliminating block effects. To tackle the above challenges, this paper provides a learned block-based hybrid image compression (LBHIC) framework. Specifically, we introduce explicit intra prediction into a learned image compression framework to utilize the relation among adjacent blocks. Superior to context modeling by linear weighting of neighbor pixels in traditional codecs, we propose a contextual prediction module (CPM) to better capture long-range correlations by utilizing the strip pooling to extract the most relevant information in neighboring latent space, thus achieving effective information prediction. Moreover, to alleviate blocking artifacts, we further propose a boundary-aware postprocessing module (BPM) with the edge importance taken into account. Extensive experiments demonstrate that the proposed LBHIC codec outperforms the VVC, with a bit-rate conservation of 4.1%, and reduces the decoding time by approximately 86.7% compared with that of state-of-the-art learned image compression methods.

  • Dissertation
  • 10.33915/etd.13084
Neural Network-based Image Compression
  • Jan 1, 2025
  • Atefeh Khoshkhahtinat

The rapid advancement of information technology and the exponential growth of digital communication have significantly increased the demand for efficient data compression techniques that reduce storage requirements, minimize bandwidth consumption, and accelerate data transmission—without substantially compromising data quality. This dissertation addresses these challenges by investigating and developing advanced learned image compression (LIC) methods, with a particular focus on lossy compression for both natural images and scientific imagery obtained from NASA’s Solar Dynamics Observatory (SDO) mission. Traditional image compression standards—such as JPEG, JPEG2000, BPG, and HEVC—rely on manually engineered transforms and heuristic rules, which often lack the adaptability required to accommodate diverse visual content and application-specific constraints. In contrast, learned image compression employs deep neural networks trained in an end-to-end manner, guided by principles from rate–distortion theory, to optimize the trade-off between compression efficiency and reconstruction fidelity. In the first part of this dissertation, several technical challenges in developing neural image compression codecs for natural images (general-purpose) are addressed, including the design of expressive nonlinear transforms, accurate entropy modeling, and the integration of perceptually meaningful loss functions. To this end, several learned image compression frameworks are proposed, each introducing distinct design innovations: a Transformer-based nonlinear transform that captures both local and global dependencies, an advanced entropy model that improves probability estimation and coding efficiency, and a conditional diffusion-based generative framework that enhances the perceptual quality of reconstructed images. The second part focuses on the application of learned compression to imagery from NASA’s Solar Dynamics Observatory (SDO) mission. A learned video compression framework is developed to exploit both spatial and temporal redundancies in solar image sequences. Furthermore, an adaptive compression strategy is introduced to prioritize scientific relevance: images containing solar flare events are compressed at lower ratios to preserve critical information, whereas non-flare images are compressed more aggressively to maximize storage and transmission efficiency. Collectively, these contributions advance the field of learned image compression across both general-purpose and scientific imaging domains, providing practical solutions for improving data transmission and storage efficiency in real-world and mission-critical environments.

  • Conference Article
  • Cite Count Icon 8
  • 10.24132/csrn.2019.2901.1.7
Evaluation of 4D Light Field Compression Methods
  • Jan 1, 2019
  • Computer Science Research Notes
  • David Barina + 4 more

Light field data records the amount of light at multiple points in space, captured e.g. by an array of cameras or by a light-field camera that uses microlenses. Since the storage and transmission requirements for such data are tremendous, compression techniques for light fields are gaining momentum in recent years. Although plenty of efficient compression formats do exist for still and moving images, only a little research on the impact of these methods on light field imagery is performed. In this paper, we evaluate the impact of state-of-the-art image and video compression methods on quality of images rendered from light field data. The methods include recent video compression standards, especially AV1 and XVC finalised in 2018. To fully exploit the potential of common image compression methods on four-dimensional light field imagery, we have extended these methods into three and four dimensions. In this paper, we show that the four-dimensional light field data can be compressed much more than independent still images while maintaining the same visual quality of a perceived picture. We gradually compare the compression performance of all image and video compression methods, and eventually answer the question,What is the best compression method for light field data?.

  • Research Article
  • Cite Count Icon 3
  • 10.1109/tip.2025.3567830
Approximately Invertible Neural Network for Learned Image Compression.
  • Jan 1, 2025
  • IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
  • Yanbo Gao + 7 more

Learned image compression has attracted considerable interests in recent years. An analysis transform and a synthesis transform, which can be regarded as coupled transforms, are used to encode an image to latent feature and decode the feature after quantization to reconstruct the image. Inspired by the success of invertible neural networks in generative modeling, invertible modules can be used to construct the coupled analysis and synthesis transforms. Considering the noise introduced in the feature quantization invalidates the invertible process, this paper proposes an Approximately Invertible Neural Network (A-INN) framework for learned image compression. It formulates the rate-distortion optimization in lossy image compression when using INN with quantization, which differentiates from using INN for generative modelling. Generally speaking, A-INN can be used as the theoretical foundation for any INN based lossy compression method. Based on this formulation, A-INN with a progressive denoising module (PDM) is developed to effectively reduce the quantization noise in the decoding. Moreover, a Cascaded Feature Recovery Module (CFRM) is designed to learn high-dimensional feature recovery from low-dimensional ones to further reduce the noise in feature channel compression. In addition, a Frequency-enhanced Decomposition and Synthesis Module (FDSM) is developed by explicitly enhancing the high-frequency components in an image to address the loss of high-frequency information inherent in neural network based image compression, thereby enhancing the reconstructed image quality. Extensive experiments demonstrate that the proposed A-INN framework achieves better or comparable compression efficiency than the conventional image compression approach and state-of-the-art learned image compression methods.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 20
  • 10.3390/rs15082211
Remote Sensing Image Compression Based on the Multiple Prior Information
  • Apr 21, 2023
  • Remote Sensing
  • Chuan Fu + 1 more

Learned image compression has achieved a series of breakthroughs for nature images, but there is little literature focusing on high-resolution remote sensing image (HRRSI) datasets. This paper focuses on designing a learned lossy image compression framework for compressing HRRSIs. Considering the local and non-local redundancy contained in HRRSI, a mixed hyperprior network is designed to explore both the local and non-local redundancy in order to improve the accuracy of entropy estimation. In detail, a transformer-based hyperprior and a CNN-based hyperprior are fused for entropy estimation. Furthermore, to reduce the mismatch between training and testing, a three-stage training strategy is introduced to refine the network. In this training strategy, the entire network is first trained, and then some sub-networks are fixed while the others are trained. To evaluate the effectiveness of the proposed compression algorithm, the experiments are conducted on an HRRSI dataset. The results show that the proposed algorithm achieves comparable or better compression performance than some traditional and learned image compression algorithms, such as Joint Photographic Experts Group (JPEG) and JPEG2000. At a similar or lower bitrate, the proposed algorithm is about 2 dB higher than the PSNR value of JPEG2000.

  • Research Article
  • Cite Count Icon 3
  • 10.5539/mas.v3n2p134
A Novel Method of Image Compression Using Multiwavelets and Set Partitioning Algorithm
  • Jan 13, 2009
  • Modern Applied Science
  • U.S. Ragupathy + 1 more

Advances in wavelet transforms and quantization methods have produced algorithms capable of surpassing the existing image compression standards like the Joint Photographic Experts Group (JPEG) algorithm. The existing compression methods for JPEG standards are using DCT with arithmetic coding and DWT with Huffman coding. The DCT uses a single kernel where as wavelet offers more number of filters depends on the applications. The wavelet based Set Partitioning In Hierarchical Trees (SPIHT) algorithm gives better compression. For best performance in image compression, wavelet transforms require filters that combine a number of desirable properties, such as orthogonality and symmetry, but they cannot simultaneously possess all of these properties. The relatively new field of multiwavelets offer more design options and can combine all desirable transform features. But there are some limitations in using the SPIHT algorithm for multiwavelets coefficients. This paper presents a new method for encoding the multiwavelet decomposed images by defining coefficients suitable for SPIHT algorithm which gives better compression performance over the existing methods in many cases.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/iciinfs.2008.4798373
New Method of Image Compression Using Multiwavelets and Set Partitioning Algorithm
  • Dec 1, 2008
  • U.S Ragupathy + 2 more

Advances in wavelet transforms and quantization methods have produced algorithms capable of surpassing the existing image compression standards like the joint photographic experts group (JPEG) algorithm. The existing compression methods for JPEG standards are using DCT with arithmetic coding and DWT with Huffman coding. The DCT uses a single kernel where as wavelet offers more number of filters depends on the applications. The wavelet based set partitioning in hierarchical trees (SPIHT) algorithm gives better compression. For best performance in image compression, wavelet transforms require filters that combine a number of desirable properties, such as orthogonality and symmetry, but they cannot simultaneously possess all of these properties. The relatively new field of multiwavelets offer more design options and can combine all desirable transform features. But there are some limitations in using the SPIHT algorithm for multiwavelet coefficients. This paper presents a new method for encoding the multiwavelet decomposed images by defining coefficients suitable for SPIHT algorithm which gives better compression performance over the existing methods in many cases.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant