Related Topics
Articles published on Learned Image Compression
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
123 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.patrec.2026.02.028
- May 1, 2026
- Pattern Recognition Letters
- Yuqing Yang + 2 more
Task-Driven learned image compression with explainability preservation for image classification
- Research Article
- 10.1145/3803542
- Mar 25, 2026
- ACM Transactions on Multimedia Computing, Communications, and Applications
- Jian Wang + 1 more
Recently learned image compression models have achieved better compression performance than traditional non-learning image compression standards. Those learned models usually utilize spatial self-attention and CNN to extract non-local and local features and generate the latent representation. However, previous methods adopt a linear layer to fuse non-local and local features and lack the flexibility to adaptively adjust feature weights and capture complex non-linear interactions between distinct feature representations. Additionally, how to more effectively compress the latent representation based on its channel similarity characteristics remains unexplored. To solve the above issues, we propose a novel image compression method with frequency feature interaction and non-local cross-similarity prior. More specifically, we extend the previous spatial self-attention module and alternately use spatial and channel self-attention modules to extract non-local spatial and channel features, respectively, and depth-wise convolution is utilized to extract local features. As local features focus on high-frequency detail information and non-local features concentrate on low-frequency structural information, we propose a frequency interaction module (FIM) that generates two weight maps to dynamically fuse non-local and local features. Moreover, we observe the non-local cross-similarity in different channels of the latent representation, which indicates that different channels share similar non-local semantic and structural information, but have distinct local detail information. So we design a dual transformer entropy model to emphasize non-local features and remove local features. Experiment results validate our method achieves promising compression performance on the Kodak, CLIC and Tecnick datasets.
- Research Article
- 10.1080/13682199.2026.2641383
- Mar 10, 2026
- The Imaging Science Journal
- Zelin Lei + 3 more
ABSTRACT Efficient image compression is critical for real-time visual transmission, yet learning-based methods often suffer from high computational complexity. This paper proposes an efficient framework based on a State Space Model (SSM), which introduces a refined context modeling mechanism in the latent representation stage to enable long-range dependency modeling with linear computational complexity.We introduce a Dual-Route Attention (DRA) mechanism for channel space entropy modeling, which adaptively aggregates latent features through content-aware routing to minimize redundancy. Unlike existing methods that use SSM as generic backbones, our approach explicitly tailors SSM for entropy-oriented context modeling.Experimental results demonstrate a superior balance between rate-distortion performance and efficiency. Compared to state-of-the-art hybrid and Transformer-based architectures, our method reduces encoding/decoding time by up to 60-70% while improving BD-rate by 1.57%–2.96%. Furthermore, it outperforms existing SSM-based baselines by 4.63% in BD-rate under similar complexity, validating its effectiveness for high-quality, low-latency image compression.
- Research Article
- 10.6025/ed/2026/15/1/33-48
- Mar 1, 2026
- Electronic Devices
- Hajar Ait Lamkademe
This paper presents a systematic comparative analysis of entropy modeling strategies in learned image compression (LIC), evaluating hyperprior (HP), autoregressive (AR), and transformer based (TR) approaches under a controlled experimental framework.Entropy modeling critically determines compression efficiency by estimating the probability distribution of latent representations, directly influencing the rate term in rate distortion optimization.To isolate the impact of entropy modeling, all architectures share identical encoder decoder backbones, latent dimensionality, and quantization schemes, with entropy modeling as the sole variable.Results reveal a clear hierarchy in entropy modeling accuracy, quantified by cross entropy gap: hyperprior models exhibit the largest gap due to limited spatial dependency capture; autoregressive models substantially reduce this gap by leveraging causal local context; and transformer based models achieve the smallest gap by exploiting long range global dependencies, particularly benefiting high complexity content.However, improved accuracy entails significant computational trade offs.Context utilization efficiency analysis shows autoregressive models excel with small contexts but face diminishing returns with larger ones.Crucially, decoder centric complexity emerges as a decisive practical constraint.Hyperprior models enable parallel decoding with minimal latency and linear scaling, making them ideal for latency sensitive applications.Autoregressive models suffer from strictly sequential decoding, resulting in super linear latency growth with resolution rendering them impractical for real time or high resolution scenarios.Transformerbased models offer superior compression gains but incur high memory demands and quadratic complexity in global attention configurations; however, configurable attention mechanisms enable controllable performance complexity trade-offs.Rate distortion complexity Pareto analysis confirms no single approach dominates universally: hyperpriors excel in low complexity regimes, transformers lead in high quality compression, and autoregressive models occupy an intermediate position.The study concludes that entropy modeling selection must balance compression efficiency against decoder feasibility, with scalable context utilization being critical for real-
- Research Article
- 10.1016/j.dsp.2025.105797
- Feb 1, 2026
- Digital Signal Processing
- Kai Hu + 4 more
Efficient learned image compression with dual-space aggregation transformer
- Research Article
- 10.1016/j.jvcir.2025.104643
- Jan 1, 2026
- Journal of Visual Communication and Image Representation
- Chao Li + 6 more
Turbo principles meet compression: Rethinking nonlinear transformations in learned image compression
- Research Article
- 10.1109/tmm.2026.3651136
- Jan 1, 2026
- IEEE Transactions on Multimedia
- Wenhong Duan + 6 more
Learned image compression (LIC) methods have shown promising results and achieved superior performance compared to traditional image compression methods. Due to the neglect of the utilization of cross-component correlations, there is still a potential for further performance improvement. In this paper, we first explore the inter-channel correlations of different color spaces and transform the image compression problem in RGB color space into that in YUV color space, which has cross-component prior information. We propose a novel image compression method that leverages local-to-global cross-component prior modeling, utilizing a cross-component attention mechanism to improve coding performance. First, we design the cross-component prior gate (CPG) to model the cross-component prior information based on attention mechanism. Inspired by common knowledge in data compression, luma component (Y) contains more details and textural/structural information compared to chroma components (UV). The proposed method can make full use of the cross-component guidance information from luma to chroma components to achieve effective image compression. Experimental results demonstrate that the proposed method can achieve superior performance compared to existing learned image compression methods. The proposed method can achieve 9.20% rate savings compared to the image compression standard Versatile Video Coding (VVC) Test Model (VTM-11.0) on Kodak dataset.
- Research Article
- 10.1109/tgrs.2026.3668020
- Jan 1, 2026
- IEEE Transactions on Geoscience and Remote Sensing
- Dongyang Liu + 6 more
With the rapid advancement of remote sensing satellites toward higher spatial resolution and revisit frequency, the explosive growth of image data has posed severe challenges to the efficiency of space-to-ground transmission. Traditional onboard compression standards, such as JPEG2000, often fail to maintain satisfactory reconstruction quality under high compression ratios, limiting their applicability in large-scale remote sensing scenarios. Although learned image compression (LIC) methods have achieved remarkable improvements in rate-distortion (RD) performance, their high computational complexity hinders deployment on resource-constrained onboard platforms. To address these challenges, this paper proposes RS-LLIC, a lightweight learned image compression framework with knowledge distillation tailored for onboard remote sensing, following the “onboard encoding and ground decoding” paradigm. Specifically, an efficient encoder architecture is designed to significantly reduce onboard computational costs, while a knowledge distillation-based training strategy is introduced to guide the lightweight encoder in feature learning using a teacher model, thereby improving RD performance without incurring additional inference overhead. Experimental results on multiple remote sensing datasets demonstrate that the proposed RS-LLIC achieves superior compression performance with extremely low encoder complexity, providing an effective solution for high-quality and efficient onboard remote sensing image compression. The code will be released on https://github.com/dy196/RS-LLIC.
- Research Article
- 10.1109/tce.2026.3666930
- Jan 1, 2026
- IEEE Transactions on Consumer Electronics
- Hui Hu + 4 more
With the widespread adoption of consumer electronic devices such as virtual reality (VR) headsets, panoramic cameras, and ultra-high-definition displays, omnidirectional (360°) images have become increasingly important for providing immersive user experiences. However, the high resolution and data volume of these images pose significant challenges for bandwidth-limited and resource-constrained consumer electronics. To address these challenges, based on an advanced parallel dual-branch hybrid architecture (TCM) consisting of convolutional neural networks (CNNs) and Swin Transformer, we propose a dual-prompt learned variable bitrate omnidirectional image compression framework, termed DPVOC, which utilizes distortion maps (Dmaps) and quality maps (Qmaps) as dual prompts to enable region-adaptive bit allocation and achieve efficient variable bitrate compression. Specifically, during training, to alleviate the computational burden of processing entire ERP images, we randomly crop ERP images into patches as input to the network. Considering the varying degrees of distortion redundancy across different regions of ERP patches, we introduce corresponding Dmap patches to record the local distortion levels. In the CNN branch, the patch-wise uniform Qmaps are element-wise multiplied with the Dmaps to modulate the CNN features. In the Swin Transformer branch, the uniform Qmap patches are used as prompts in the attention mechanism to guide the feature embeddings for adaptability to bitrate variations. Additionally, Dmap patches are introduced into the feedforward network (FFN) of the Swin Transformer to suppress redundant information. By incorporating fine-grained and symmetric prompts from both Qmaps and Dmaps into the encoder and decoder through the dual-branch structure, our networks can effectively adapt to diverse bitrate requirements. During inference, entire Qmaps and Dmaps are used as inputs, and their bitrate overhead is negligible. Experimental results demonstrate that DPVOC achieves superior performance in omnidirectional image compression while maintaining low computational complexity.
- Research Article
- 10.1109/tmm.2026.3676828
- Jan 1, 2026
- IEEE Transactions on Multimedia
- Mazouz Alaa Eddine + 6 more
Security and Real-Time FPGA Integration for Learned Image Compression
- Research Article
- 10.1186/s13634-025-01268-x
- Dec 29, 2025
- EURASIP Journal on Advances in Signal Processing
- Ran Wang + 3 more
Recent advances in learned image compression (LIC) have demonstrated superior performance over traditional methods but often require training and storage of multiple models to handle different bitrate settings. In this paper, we propose the Uniform Spatial-Frequency Residual Bottleneck Modulation Adapter (U-SFRB), a plug-and-play, adapter-based framework for variable rate image compression that significantly reduces training and storage overhead. Our method freezes the backbone network and only trains lightweight adapters—Spatial-Frequency Residual Bottleneck Adapters (SFRBs)—to achieve rate adaptability. By inserting multiple SFRBs in parallel, our approach enables a single model to support a wide range of bitrates. Unlike prompt-based methods restricted to transformer architectures, our approach is compatible with both CNN- and transformer-based compression models. Experimental results on the Kodak and CLIC datasets show that our method achieves competitive rate-distortion performance compared to state-of-the-art variable rate compression approaches, with the advantage of lower training complexity and better model flexibility.
- Research Article
- 10.3390/jimaging12010012
- Dec 26, 2025
- Journal of Imaging
- Sibusiso B Buthelezi + 1 more
We present a hybrid end-to-end learned image compression framework that combines a CNN-based variational autoencoder (VAE) with an efficient hierarchical Swin Transformer to address the limitations of existing entropy models in capturing global dependencies under computational constraints. Traditional VAE-based codecs typically rely on CNN-based priors with localized receptive fields, which are insufficient for modelling the complex, high-dimensional dependencies of the latent space, thereby limiting compression efficiency. While fully global transformer-based models can capture long-range dependencies, their high computational complexity makes them impractical for high-resolution image compression. To overcome this trade-off, our approach couples a CNN-based VAE with a patch-based hierarchical Swin Transformer hyperprior that employs shifted window self-attention to effectively model both local and global contextual information while maintaining computational efficiency. The proposed framework tightly integrates this expressive entropy model with an end-to-end differentiable quantization module, enabling joint optimization of the complete rate-distortion objective. By learning a more accurate probability distribution of the latent representation, the model achieves improved bitrate estimation and a more compact latent representation, resulting in enhanced compression performance. We validate our approach on the widely used Kodak, JPEG AI, and CLIC datasets, demonstrating that the proposed hybrid architecture achieves superior rate-distortion performance, delivering higher visual quality at lower bitrates compared to methods relying on simpler CNN-based entropy priors. This work demonstrates the effectiveness of integrating efficient transformer architectures into learned image compression and highlights their potential for advancing entropy modelling beyond conventional CNN-based designs.
- Research Article
- 10.1145/3785671
- Dec 22, 2025
- ACM Transactions on Multimedia Computing, Communications, and Applications
- Wei Jiang + 4 more
Recent advances in learned image compression (LIC) have achieved remarkable performance improvements over traditional codecs. Notably, the MLIC series—LICs equipped with multi-reference entropy models—have substantially surpassed conventional image codecs such as Versatile Video Coding (VVC) Intra. However, existing MLIC variants suffer from several limitations: performance degradation at high bit-rates due to insufficient transform capacity, suboptimal entropy modeling that fails to capture global correlations in initial slices, and lack of adaptive channel importance modeling. In this paper, we propose MLICv2 and MLICv2 \({}^{+}\) , enhanced successors that systematically address these limitations through improved transform design, advanced entropy modeling, and exploration of the potential of instance-specific optimization. For transform enhancement, we introduce a lightweight token mixing block inspired by the MetaFormer architecture, which effectively mitigates high-bit-rate performance degradation while maintaining computational efficiency. For entropy modeling improvements, we propose hyperprior-guided global correlation prediction to extract global context even in the initial slice of latent representation, complemented by a channel reweighting module that dynamically emphasizes informative channels. We further explore enhanced positional embedding and guided selective compression strategies for superior context modeling. Additionally, we apply the Stochastic Gumbel Annealing (SGA) to demonstrate the potential for further performance improvements through input-specific optimization. Extensive experiments demonstrate that MLICv2 and MLICv2 \({}^{+}\) achieve state-of-the-art results, reducing Bjøntegaard-Delta Rate by 16.54%, 21.61%, 16.05% and 20.46%, 24.35%, 19.14% on Kodak, Tecnick, and CLIC Pro Val datasets, respectively, compared to VTM-17.0 Intra.
- Research Article
- 10.1007/s11760-025-04978-9
- Dec 1, 2025
- Signal, Image and Video Processing
- Lingchen Qiu + 3 more
Learned image compression with dual frequency branch modulation
- Research Article
- 10.1016/j.jvcir.2025.104634
- Dec 1, 2025
- Journal of Visual Communication and Image Representation
- Fan Ye + 2 more
Variable-rate learned image compression with integer-arithmetic-only inference
- Research Article
- 10.3390/app152212151
- Nov 16, 2025
- Applied Sciences
- Yong-Hwan Lee + 1 more
We present a variable-rate learned image compression (LIC) model that integrates Transformer-based quantization–reconstruction (QR) offset prediction, entropy-guided hyper-latent quantization, and perceptually informed multi-objective optimization. Unlike existing LIC frameworks that train separate networks for each bitrate, the proposed method achieves continuous rate adaptation within a single model by dynamically balancing rate, distortion and perceptual objectives. Channel-wise asymmetric quantization and a composite loss combining MSE and LPIPS further enhance reconstruction fidelity and subjective quality. Experiments on the Kodak, CLIC2020 and Tecnick datasets show gains of +1.15 dB PSNR, +0.065 MS-SSIM, and −0.32 LPIPS relative to the baselines variable-rate method, while improving bitrate-control accuracy by 62.5%. With approximately 15% computational overhead, the framework achieves competitive compression efficiency and enhanced perceptual quality, offering a practical solution for adaptive, high-quality image delivery.
- Research Article
- 10.1007/s11554-025-01795-8
- Nov 4, 2025
- Journal of Real-Time Image Processing
- Yaohua Zhu + 5 more
Deep learning image compression with multi-channel tANS coding and hardware deployment
- Research Article
4
- 10.1016/j.image.2025.117325
- Oct 1, 2025
- Signal Processing: Image Communication
- Wen Tan + 4 more
Adaptive cross-channel transformation based on self-modulation for learned image compression
- Research Article
1
- 10.1145/3748654
- Sep 10, 2025
- ACM Transactions on Multimedia Computing, Communications, and Applications
- Junle Liu + 4 more
Feature Coding for Machines (FCM) aims to compress intermediate features effectively for remote intelligent analytics, which is crucial for future intelligent visual applications. In this article, we propose a Multiscale Feature Importance-based Bit Allocation (MFIBA) for end-to-end FCM. First, we find that the importance of features for machine vision tasks varies with the scales, object size, and image instances. Based on this finding, we propose a Multiscale Feature Importance Prediction (MFIP) module to predict the importance weight for each scale of features. Second, we propose a task loss-rate model to establish the relationship between the task accuracy losses of using compressed features and the bit rate of encoding these features. Finally, we develop an MFIBA for end-to-end FCM, which is able to assign coding bits of multiscale features more reasonably based on their importance. Experimental results demonstrate that when combined with a retained Efficient Learned Image Compression (ELIC), the proposed MFIBA achieves an average of 38.202% bit-rate savings in object detection compared to the anchor ELIC. Moreover, the proposed MFIBA achieves an average of 17.212% and 36.492% feature bit-rate savings for instance segmentation and keypoint detection, respectively. When the proposed MFIBA is applied to the LIC-TCM, it achieves an average of 18.103%, 19.866%, and 19.597% bit-rate savings on three machine vision tasks, respectively, which validates the proposed MFIBA has good generalizability and adaptability to different machine vision tasks and FCM base codecs.
- Research Article
4
- 10.1016/j.neunet.2025.107590
- Sep 1, 2025
- Neural networks : the official journal of the International Neural Network Society
- Yongqiang Wang + 5 more
S2LIC: Learned image compression with the SwinV2 block, Adaptive Channel-wise and Global-inter attention Context.