Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Learned image compression via neighborhood-based attention optimization and context modeling with multi-scale guiding

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Learned image compression via neighborhood-based attention optimization and context modeling with multi-scale guiding

Similar Papers
  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icip40778.2020.9190974
Shrinkage as Activation for Learned Image Compression
  • Oct 1, 2020
  • Ogun Kirmemis + 1 more

With recent advances in learned entropy and context models, the rate-distortion performance of deep learned image compression methods reached or surpassed those of conventional codecs. However, learned image compression is currently more complex and slower than conventional image compression. Learned image and video compression methods almost exclusively employ the generalized divisive normalization (GDN) activation function. This paper investigates the effect of activation function on the performance of image compression in terms of both objective and subjective criteria as well as runtime. In particular, we show that the distribution of latents produced by hard shrinkage fits a Laplacian better, and it is possible to achieve similar rate-distortion and better visual performance using hard shrinkage with lower complexity.

  • Research Article
  • Cite Count Icon 66
  • 10.1109/tcsvt.2021.3119660
Learned Block-Based Hybrid Image Compression
  • Jun 1, 2022
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Yaojun Wu + 4 more

Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications. First, parallel acceleration of the autoregressive entropy model cannot be achieved due to serial decoding. Second, full-resolution inference often causes the out-of-memory (OOM) problem with limited GPU resources, especially for high-resolution images. Block partition is a good choice to handle the above issues, but it brings about new challenges in reducing the redundancy between blocks and eliminating block effects. To tackle the above challenges, this paper provides a learned block-based hybrid image compression (LBHIC) framework. Specifically, we introduce explicit intra prediction into a learned image compression framework to utilize the relation among adjacent blocks. Superior to context modeling by linear weighting of neighbor pixels in traditional codecs, we propose a contextual prediction module (CPM) to better capture long-range correlations by utilizing the strip pooling to extract the most relevant information in neighboring latent space, thus achieving effective information prediction. Moreover, to alleviate blocking artifacts, we further propose a boundary-aware postprocessing module (BPM) with the edge importance taken into account. Extensive experiments demonstrate that the proposed LBHIC codec outperforms the VVC, with a bit-rate conservation of 4.1%, and reduces the decoding time by approximately 86.7% compared with that of state-of-the-art learned image compression methods.

  • Research Article
  • Cite Count Icon 16
  • 10.1109/tcsvt.2022.3229701
Learned Progressive Image Compression With Dead-Zone Quantizers
  • Jun 1, 2023
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Shaohui Li + 5 more

Progressive coding is essential to the practical deployment of learned image compression over heterogeneous networks and clients. Existing methods for learned progressive image compression require complex and empirical design to achieve near-optimal rate-distortion performance over a wide range of bit-rates. However, these methods are limited by the implicit learned mechanism based on neural networks and introduction of uniform quantizers. In this paper, we propose generalized learned progressive image compression with analytic rate-distortion optimization using dead-zone quantizers on the latent representation. Specifically, we reveal that dead-zone quantizers, as a general case of uniform quantizers, are equivalent to uniform quantizers in fixed-rate nonlinear transform coding and can prevent extra redundancy in embedded quantization for progressive coding. Consequently, we propose rate-distortion optimized learned progressive coding by approximating the optimal quantizer in the source spaces using dead-zone quantizers in an analytic manner on the Laplacian source. To our best knowledge, this paper is the first to achieve general learned progressive coding from the perspective of optimal quantizers. The proposed method achieves theoretically sound and practically efficient embedded quantization and learned progressive coding of latent representations with improved rate-distortion performance. It can also enable embedded quantization with diverse assignments of truncation points and support flexible configuration of quality layers of varying numbers and at varying target bit-rates. Furthermore, we successfully incorporate the proposed method into existing pre-trained fixed-rate models to realize progressive learned image compression without re-training. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performance in learned progressive image compression compared with traditional codecs and recent learned methods.

  • Conference Article
  • Cite Count Icon 257
  • 10.1109/cvpr52688.2022.01697
The Devil Is in the Details: Window-based Attention for Image Compression
  • Jun 1, 2022
  • Renjie Zou + 2 more

Learned image compression methods have exhibited superior rate-distortion performance than classical image compression standards. Most existing learned image compression models are based on Convolutional Neural Networks (CNNs). Despite great contributions, a main drawback of CNN based model is that its structure is not designed for capturing local redundancy, especially the nonrepetitive textures, which severely affects the reconstruction quality. Therefore, how to make full use of both global structure and local texture becomes the core problem for learning-based image compression. Inspired by recent progresses of Vision Transformer (ViT) and Swin Transformer, we found that combining the local-aware attention mechanism with the global-related feature learning could meet the expectation in image compression. In this paper, we first extensively study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block. The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models. Moreover, we propose a novel Symmetrical TransFormer (STF) framework with absolute transformer blocks in the down-sampling encoder and up-sampling decoder. Extensive experimental evaluations have shown that the proposed method is effective and outperforms the state-of-the-art methods. The code is publicly available at https://github.com/Googolxx/STF.

  • Conference Article
  • Cite Count Icon 112
  • 10.1109/cvpr.2019.01031
Learning Image and Video Compression Through Spatial-Temporal Energy Compaction
  • Jun 1, 2019
  • Zhengxue Cheng + 3 more

Compression has been an important research topic for many decades, to produce a significant impact on data transmission and storage. Recent advances have shown a great potential of learning image and video compression. Inspired from related works, in this paper, we present an image compression architecture using a convolutional autoencoder, and then generalize image compression to video compression, by adding an interpolation loop into both encoder and decoder sides. Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. Thereby, we propose to add a spatial energy compaction-based penalty into loss function, to achieve higher image compression performance. Furthermore, based on temporal energy distribution, we propose to select the number of frames in one interpolation loop, adapting to the motion characteristics of video contents. Experimental results demonstrate that our proposed image compression outperforms the latest image compression standard with MS-SSIM quality metric, and provides higher performance compared with state-of-the-art learning compression methods at high bit rates, which benefits from our spatial energy compaction approach. Meanwhile, our proposed video compression approach with temporal energy compaction can significantly outperform MPEG-4 and is competitive with commonly used H.264. Both our image and video compression can produce more visually pleasant results than traditional standards.

  • Dissertation
  • 10.33915/etd.13084
Neural Network-based Image Compression
  • Jan 1, 2025
  • Atefeh Khoshkhahtinat

The rapid advancement of information technology and the exponential growth of digital communication have significantly increased the demand for efficient data compression techniques that reduce storage requirements, minimize bandwidth consumption, and accelerate data transmission—without substantially compromising data quality. This dissertation addresses these challenges by investigating and developing advanced learned image compression (LIC) methods, with a particular focus on lossy compression for both natural images and scientific imagery obtained from NASA’s Solar Dynamics Observatory (SDO) mission. Traditional image compression standards—such as JPEG, JPEG2000, BPG, and HEVC—rely on manually engineered transforms and heuristic rules, which often lack the adaptability required to accommodate diverse visual content and application-specific constraints. In contrast, learned image compression employs deep neural networks trained in an end-to-end manner, guided by principles from rate–distortion theory, to optimize the trade-off between compression efficiency and reconstruction fidelity. In the first part of this dissertation, several technical challenges in developing neural image compression codecs for natural images (general-purpose) are addressed, including the design of expressive nonlinear transforms, accurate entropy modeling, and the integration of perceptually meaningful loss functions. To this end, several learned image compression frameworks are proposed, each introducing distinct design innovations: a Transformer-based nonlinear transform that captures both local and global dependencies, an advanced entropy model that improves probability estimation and coding efficiency, and a conditional diffusion-based generative framework that enhances the perceptual quality of reconstructed images. The second part focuses on the application of learned compression to imagery from NASA’s Solar Dynamics Observatory (SDO) mission. A learned video compression framework is developed to exploit both spatial and temporal redundancies in solar image sequences. Furthermore, an adaptive compression strategy is introduced to prioritize scientific relevance: images containing solar flare events are compressed at lower ratios to preserve critical information, whereas non-flare images are compressed more aggressively to maximize storage and transmission efficiency. Collectively, these contributions advance the field of learned image compression across both general-purpose and scientific imaging domains, providing practical solutions for improving data transmission and storage efficiency in real-world and mission-critical environments.

  • Research Article
  • Cite Count Icon 5
  • 10.1016/j.jvcir.2023.103990
Corner-to-Center long-range context model for efficient learned image compression
  • Nov 24, 2023
  • Journal of Visual Communication and Image Representation
  • Yang Sui + 6 more

Corner-to-Center long-range context model for efficient learned image compression

  • Research Article
  • Cite Count Icon 33
  • 10.1109/tnnls.2021.3104974
Learning Context-Based Nonlocal Entropy Modeling for Image Compression
  • Mar 1, 2023
  • IEEE Transactions on Neural Networks and Learning Systems
  • Mu Li + 5 more

The entropy of the codes usually serves as the rate loss in the recent learned lossy image compression methods. Precise estimation of the probabilistic distribution of the codes plays a vital role in reducing the entropy and boosting the joint rate-distortion performance. However, existing deep learning based entropy models generally assume the latent codes are statistically independent or depend on some side information or local context, which fails to take the global similarity within the context into account and thus hinders the accurate entropy estimation. To address this issue, we propose a special nonlocal operation for context modeling by employing the global similarity within the context. Specifically, due to the constraint of context, nonlocal operation is incalculable in context modeling. We exploit the relationship between the code maps produced by deep neural networks and introduce the proxy similarity functions as a workaround. Then, we combine the local and the global context via a nonlocal attention block and employ it in masked convolutional networks for entropy modeling. Taking the consideration that the width of the transforms is essential in training low distortion models, we finally produce a U-net block in the transforms to increase the width with manageable memory consumption and time complexity. Experiments on Kodak and Tecnick datasets demonstrate the priority of the proposed context-based nonlocal attention block in entropy modeling and the U-net block in low distortion situations. On the whole, our model performs favorably against the existing image compression standards and recent deep image compression models.

  • Research Article
  • 10.1109/jetcas.2025.3538652
Learned Image Compression With Efficient Cross-Platform Entropy Coding
  • Jan 1, 2025
  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • Runyu Yang + 3 more

Learned image compression has shown remarkable compression efficiency gain over the traditional image compression solutions, which is partially attributed to the learned entropy models and the adopted entropy coding engine. However, the inference of the entropy models and the sequential nature of the entropy coding both incur high time complexity. Meanwhile, the neural network-based entropy models usually involve floating-point computations, which incur inconsistent probability estimation and decoding failure in different platforms. We address these limitations by introducing an efficient and cross-platform entropy coding method, chain coding-based latent compression (CC-LC), into learned image compression. First, we leverage the classic chain coding and carefully design a block-based entropy coding procedure, significantly reducing the number of coding symbols and thus the coding time. Second, since CC-LC is not based on neural networks, we propose a rate estimation network as a surrogate of CC-LC during the end-to-end training. Third, we alternately train the analysis/synthesis networks and the rate estimation network for the rate-distortion optimization, making the learned latent fit CC-LC. Experimental results show that our method achieves much lower time complexity than the other learned image compression methods, ensures cross-platform consistency, and has comparable compression efficiency with BPG. Our code and models are publicly available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/Yang-Runyu/CC-LC</uri>.

  • Conference Article
  • Cite Count Icon 96
  • 10.1109/cvpr52688.2022.00590
Joint Global and Local Hierarchical Priors for Learned Image Compression
  • Jun 1, 2022
  • Jun-Hyuk Kim + 2 more

Recently, learned image compression methods have out-performed traditional hand-crafted ones including BPG. One of the keys to this success is learned entropy models that estimate the probability distribution of the quantized latent representation. Like other vision tasks, most recent learned entropy models are based on convolutional neural networks (CNNs). However, CNNs have a limitation in modeling long-range dependencies due to their nature of local connectivity, which can be a significant bottleneck in image compression where reducing spatial redundancy is a key point. To overcome this issue, we propose a novel entropy model called Information Transformer (Informer) that exploits both global and local information in a content-dependent manner using an attention mechanism. Our experiments show that Informer improves rate-distortion performance over the state-of-the-art methods on the Kodak and Tecnick datasets without the quadratic computational complexity problem. Our source code is available at https://github.com/naver-ai/informer.

  • Conference Article
  • 10.1109/pcs56426.2022.10018076
Tensor Network-Based Entropy Coding For Learned Image Compression
  • Dec 7, 2022
  • Xiaoxuan Fan + 5 more

Entropy coding is fundamental for reducing the coding redundancy in image compression. However, existing entropy models for learned image compression are restricted by independent or autoregressive modeling based on presumed distribution generated from the family of Gaussian functions. In this paper, we propose a novel tensor network-based entropy model that can explicitly infer the joint distribution of image representation for learned image compression. We utilize a tree tensor network (TTN) to enable exact computation for probabilities of image representation. Specifically, we produce efficient tensor representation for entropy modeling based on bit planes slicing with gray code. Furthermore, we leverage tensor contraction to accurately calculate the partition function and jointly predict the entire bit plane for entropy coding. To our best knowledge, this paper is the first to directly infer the joint distribution of image representation in learned image compression without any presumed condition. Experimental results demonstrate that the proposed method is competitive with the state-of-the-art in both lossless and lossy image compression.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.neunet.2025.107590
S2LIC: Learned image compression with the SwinV2 block, Adaptive Channel-wise and Global-inter attention Context.
  • Sep 1, 2025
  • Neural networks : the official journal of the International Neural Network Society
  • Yongqiang Wang + 5 more

S2LIC: Learned image compression with the SwinV2 block, Adaptive Channel-wise and Global-inter attention Context.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 20
  • 10.3390/rs15082211
Remote Sensing Image Compression Based on the Multiple Prior Information
  • Apr 21, 2023
  • Remote Sensing
  • Chuan Fu + 1 more

Learned image compression has achieved a series of breakthroughs for nature images, but there is little literature focusing on high-resolution remote sensing image (HRRSI) datasets. This paper focuses on designing a learned lossy image compression framework for compressing HRRSIs. Considering the local and non-local redundancy contained in HRRSI, a mixed hyperprior network is designed to explore both the local and non-local redundancy in order to improve the accuracy of entropy estimation. In detail, a transformer-based hyperprior and a CNN-based hyperprior are fused for entropy estimation. Furthermore, to reduce the mismatch between training and testing, a three-stage training strategy is introduced to refine the network. In this training strategy, the entire network is first trained, and then some sub-networks are fixed while the others are trained. To evaluate the effectiveness of the proposed compression algorithm, the experiments are conducted on an HRRSI dataset. The results show that the proposed algorithm achieves comparable or better compression performance than some traditional and learned image compression algorithms, such as Joint Photographic Experts Group (JPEG) and JPEG2000. At a similar or lower bitrate, the proposed algorithm is about 2 dB higher than the PSNR value of JPEG2000.

  • Research Article
  • 10.1109/tcsvt.2024.3522621
Sparse Point Clouds Assisted Learned Image Compression
  • May 1, 2025
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Yiheng Jiang + 4 more

In the field of autonomous driving, a variety of sensor data types exist, each representing different modalities of the same scene. Therefore, it is feasible to utilize data from other sensors to facilitate image compression. However, few techniques have explored the potential benefits of utilizing inter-modality correlations to enhance the image compression performance. In this paper, motivated by the recent success of learned image compression, we propose a new framework that uses sparse point clouds to assist in learned image compression in the autonomous driving scenario. We first project the 3D sparse point cloud onto a 2D plane, resulting in a sparse depth map. Utilizing this depth map, we proceed to predict camera images. Subsequently, we use these predicted images to extract multi-scale structural features. These features are then incorporated into learned image compression pipeline as additional information to improve the compression performance. Our proposed framework is compatible with various mainstream learned image compression models, and we validate our approach using different existing image compression methods. The experimental results show that incorporating point cloud assistance into the compression pipeline consistently enhances the performance.

  • Research Article
  • Cite Count Icon 159
  • 10.1109/tcsvt.2021.3089491
Causal Contextual Prediction for Learned Image Compression
  • Apr 1, 2022
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Zongyu Guo + 3 more

Over the past several years, we have witnessed impressive progress in the field of learned image compression. Recent learned image codecs are commonly based on autoencoders, that first encode an image into low-dimensional latent representations and then decode them for reconstruction purposes. To capture spatial dependencies in the latent space, prior works exploit hyperprior and spatial context model to build an entropy model, which estimates the bit-rate for end-to-end rate-distortion optimization. However, such an entropy model is suboptimal from two aspects: (1) It fails to capture spatially global correlations among the latents. (2) Cross-channel relationships of the latents are still underexplored. In this paper, we propose the concept of separate entropy coding to leverage a serial decoding process for causal contextual entropy prediction in the latent space. A causal context model is proposed that separates the latents across channels and makes use of cross-channel relationships to generate highly informative contexts. Furthermore, we propose a causal global prediction model, which is able to find global reference points for accurate predictions of unknown points. Both these two models facilitate entropy estimation without the transmission of overhead. In addition, we further adopt a new separate attention module to build more powerful transform networks. Experimental results demonstrate that our full image compression model outperforms standard VVC/H.266 codec on Kodak dataset in terms of both PSNR and MS-SSIM, yielding the state-of-the-art rate-distortion performance.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant