Few-Shot Domain Adaptation for Learned Image Compression
Learned image compression (LIC) has achieved state-of-the-art rate-distortion performance, deemed promising for next-generation image compression techniques. However, pre-trained LIC models usually suffer from significant performance degradation when applied to out-of-training-domain images, implying their poor generalization capabilities. To tackle this problem, we propose a few-shot domain adaptation method for LIC by integrating plug-and-play adapters into pre-trained models. Drawing inspiration from the analogy between latent channels and frequency components, we examine domain gaps in LIC and observe that out-of-training-domain images disrupt pre-trained channel-wise decomposition. Consequently, we introduce a method for channel-wise re-allocation using convolution-based adapters and low-rank adapters, which are lightweight and compatible to mainstream LIC schemes. Extensive experiments across multiple domains and multiple representative LIC schemes demonstrate that our method significantly enhances pre-trained models, achieving comparable performance to H.266/VVC intra coding with merely 25 target-domain samples. Additionally, our method matches the performance of full-model finetune while transmitting fewer than 2% of the parameters.
- Research Article
16
- 10.1109/tcsvt.2022.3229701
- Jun 1, 2023
- IEEE Transactions on Circuits and Systems for Video Technology
Progressive coding is essential to the practical deployment of learned image compression over heterogeneous networks and clients. Existing methods for learned progressive image compression require complex and empirical design to achieve near-optimal rate-distortion performance over a wide range of bit-rates. However, these methods are limited by the implicit learned mechanism based on neural networks and introduction of uniform quantizers. In this paper, we propose generalized learned progressive image compression with analytic rate-distortion optimization using dead-zone quantizers on the latent representation. Specifically, we reveal that dead-zone quantizers, as a general case of uniform quantizers, are equivalent to uniform quantizers in fixed-rate nonlinear transform coding and can prevent extra redundancy in embedded quantization for progressive coding. Consequently, we propose rate-distortion optimized learned progressive coding by approximating the optimal quantizer in the source spaces using dead-zone quantizers in an analytic manner on the Laplacian source. To our best knowledge, this paper is the first to achieve general learned progressive coding from the perspective of optimal quantizers. The proposed method achieves theoretically sound and practically efficient embedded quantization and learned progressive coding of latent representations with improved rate-distortion performance. It can also enable embedded quantization with diverse assignments of truncation points and support flexible configuration of quality layers of varying numbers and at varying target bit-rates. Furthermore, we successfully incorporate the proposed method into existing pre-trained fixed-rate models to realize progressive learned image compression without re-training. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performance in learned progressive image compression compared with traditional codecs and recent learned methods.
- Research Article
66
- 10.1109/tcsvt.2021.3119660
- Jun 1, 2022
- IEEE Transactions on Circuits and Systems for Video Technology
Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications. First, parallel acceleration of the autoregressive entropy model cannot be achieved due to serial decoding. Second, full-resolution inference often causes the out-of-memory (OOM) problem with limited GPU resources, especially for high-resolution images. Block partition is a good choice to handle the above issues, but it brings about new challenges in reducing the redundancy between blocks and eliminating block effects. To tackle the above challenges, this paper provides a learned block-based hybrid image compression (LBHIC) framework. Specifically, we introduce explicit intra prediction into a learned image compression framework to utilize the relation among adjacent blocks. Superior to context modeling by linear weighting of neighbor pixels in traditional codecs, we propose a contextual prediction module (CPM) to better capture long-range correlations by utilizing the strip pooling to extract the most relevant information in neighboring latent space, thus achieving effective information prediction. Moreover, to alleviate blocking artifacts, we further propose a boundary-aware postprocessing module (BPM) with the edge importance taken into account. Extensive experiments demonstrate that the proposed LBHIC codec outperforms the VVC, with a bit-rate conservation of 4.1%, and reduces the decoding time by approximately 86.7% compared with that of state-of-the-art learned image compression methods.
- Book Chapter
2
- 10.1007/978-3-031-19839-7_16
- Jan 1, 2022
In Cloud 3D, such as Cloud Gaming and Cloud Virtual Reality (VR), image frames are rendered and compressed (encoded) in the cloud, and sent to the clients for users to view. For low latency and high image quality, fast, high compression rate, and high-quality image compression techniques are preferable. This paper explores computation time reduction techniques for learned image compression to make it more suitable for cloud 3D. More specifically, we employed slim (low-complexity) and application-specific AI models to reduce the computation time without degrading image quality. Our approach is based on two key insights: (1) as the frames generated by a 3D application are highly homogeneous, application-specific compression models can improve the rate-distortion performance over a general model; (2) many computer-generated frames from 3D applications are less complex than natural photos, which makes it feasible to reduce the model complexity to accelerate compression computation. We evaluated our models on six gaming image datasets. The results show that our approach has similar rate-distortion performance as a state-of-the-art learned image compression algorithm, while obtaining about 5x to 9x speedup and reducing the compression time to be less than 1 s (0.74s), bringing learned image compression closer to being viable for cloud 3D. Code is available at https://github.com/cloud-graphics-rendering/AppSpecificLIC.KeywordsCloud gamingCloud virtual realityLearned image compressionModel simplificationApplication-specific modelingModel-task balance
- Research Article
8
- 10.1109/tcsvt.2024.3401872
- Oct 1, 2024
- IEEE Transactions on Circuits and Systems for Video Technology
In recent years, Learned Image Compression (LIC) has undergone rapid evolution. However, it is worthy noting that most prevalent LIC methodologies still rely on uniform Scalar Quantization (SQ) for latent features. This overlooks the untapped potential of contextual information, which could be leveraged to significantly reduce statistical redundancies. Prior researches have explored Vector Quantization (VQ)’s adaptability to diverse data distributions, yet it introduces significant computational complexity into LIC, hindering its practical implementation. Consequently, in this work, we propose the Contextual Sequential Quantization (CSQ) method, which progressively discretizes the latent features of LIC by harnessing content contextual information and image textural priors. Our proposed CSQ signifies progress in LIC by blending the computational efficiency of SQ with a substantial approach towards the adaptability of VQ. We further propose the Center Compensation Module (CCM) based on the proposed CSQ. This module strategically determines adaptive quantization centers, leading to a direct enhancement of reconstruction quality without compromising the bit-rate. Moreover, it is worth noticing that existing LIC approaches face challenges in leveraging hyper side information to effectively enhance transformations, which is attributed to the entanglement of the hyperprior generation module with the main transformations. Consequently, we propose to decouple the hyperprior module from main transformations, and design the Hyperprior-Assisted Transformation (HAT) unit to feed hyperprior back into main transformations. This further improves the coding performance. By integrating all together the proposed CSQ, CCM, and HAT, our proposed Non-uniform quantization-based LIC (NLIC) method attains state-of-the-art rate-distortion (R-D) performance among existing LIC methodologies.
- Research Article
20
- 10.3390/rs15082211
- Apr 21, 2023
- Remote Sensing
Learned image compression has achieved a series of breakthroughs for nature images, but there is little literature focusing on high-resolution remote sensing image (HRRSI) datasets. This paper focuses on designing a learned lossy image compression framework for compressing HRRSIs. Considering the local and non-local redundancy contained in HRRSI, a mixed hyperprior network is designed to explore both the local and non-local redundancy in order to improve the accuracy of entropy estimation. In detail, a transformer-based hyperprior and a CNN-based hyperprior are fused for entropy estimation. Furthermore, to reduce the mismatch between training and testing, a three-stage training strategy is introduced to refine the network. In this training strategy, the entire network is first trained, and then some sub-networks are fixed while the others are trained. To evaluate the effectiveness of the proposed compression algorithm, the experiments are conducted on an HRRSI dataset. The results show that the proposed algorithm achieves comparable or better compression performance than some traditional and learned image compression algorithms, such as Joint Photographic Experts Group (JPEG) and JPEG2000. At a similar or lower bitrate, the proposed algorithm is about 2 dB higher than the PSNR value of JPEG2000.
- Conference Article
112
- 10.1109/cvpr.2019.01031
- Jun 1, 2019
Compression has been an important research topic for many decades, to produce a significant impact on data transmission and storage. Recent advances have shown a great potential of learning image and video compression. Inspired from related works, in this paper, we present an image compression architecture using a convolutional autoencoder, and then generalize image compression to video compression, by adding an interpolation loop into both encoder and decoder sides. Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. Thereby, we propose to add a spatial energy compaction-based penalty into loss function, to achieve higher image compression performance. Furthermore, based on temporal energy distribution, we propose to select the number of frames in one interpolation loop, adapting to the motion characteristics of video contents. Experimental results demonstrate that our proposed image compression outperforms the latest image compression standard with MS-SSIM quality metric, and provides higher performance compared with state-of-the-art learning compression methods at high bit rates, which benefits from our spatial energy compaction approach. Meanwhile, our proposed video compression approach with temporal energy compaction can significantly outperform MPEG-4 and is competitive with commonly used H.264. Both our image and video compression can produce more visually pleasant results than traditional standards.
- Research Article
- 10.1109/tcsvt.2024.3522621
- May 1, 2025
- IEEE Transactions on Circuits and Systems for Video Technology
In the field of autonomous driving, a variety of sensor data types exist, each representing different modalities of the same scene. Therefore, it is feasible to utilize data from other sensors to facilitate image compression. However, few techniques have explored the potential benefits of utilizing inter-modality correlations to enhance the image compression performance. In this paper, motivated by the recent success of learned image compression, we propose a new framework that uses sparse point clouds to assist in learned image compression in the autonomous driving scenario. We first project the 3D sparse point cloud onto a 2D plane, resulting in a sparse depth map. Utilizing this depth map, we proceed to predict camera images. Subsequently, we use these predicted images to extract multi-scale structural features. These features are then incorporated into learned image compression pipeline as additional information to improve the compression performance. Our proposed framework is compatible with various mainstream learned image compression models, and we validate our approach using different existing image compression methods. The experimental results show that incorporating point cloud assistance into the compression pipeline consistently enhances the performance.
- Conference Article
3
- 10.1109/icip40778.2020.9190974
- Oct 1, 2020
With recent advances in learned entropy and context models, the rate-distortion performance of deep learned image compression methods reached or surpassed those of conventional codecs. However, learned image compression is currently more complex and slower than conventional image compression. Learned image and video compression methods almost exclusively employ the generalized divisive normalization (GDN) activation function. This paper investigates the effect of activation function on the performance of image compression in terms of both objective and subjective criteria as well as runtime. In particular, we show that the distribution of latents produced by hard shrinkage fits a Laplacian better, and it is possible to achieve similar rate-distortion and better visual performance using hard shrinkage with lower complexity.
- Dissertation
- 10.33915/etd.13084
- Jan 1, 2025
The rapid advancement of information technology and the exponential growth of digital communication have significantly increased the demand for efficient data compression techniques that reduce storage requirements, minimize bandwidth consumption, and accelerate data transmission—without substantially compromising data quality. This dissertation addresses these challenges by investigating and developing advanced learned image compression (LIC) methods, with a particular focus on lossy compression for both natural images and scientific imagery obtained from NASA’s Solar Dynamics Observatory (SDO) mission. Traditional image compression standards—such as JPEG, JPEG2000, BPG, and HEVC—rely on manually engineered transforms and heuristic rules, which often lack the adaptability required to accommodate diverse visual content and application-specific constraints. In contrast, learned image compression employs deep neural networks trained in an end-to-end manner, guided by principles from rate–distortion theory, to optimize the trade-off between compression efficiency and reconstruction fidelity. In the first part of this dissertation, several technical challenges in developing neural image compression codecs for natural images (general-purpose) are addressed, including the design of expressive nonlinear transforms, accurate entropy modeling, and the integration of perceptually meaningful loss functions. To this end, several learned image compression frameworks are proposed, each introducing distinct design innovations: a Transformer-based nonlinear transform that captures both local and global dependencies, an advanced entropy model that improves probability estimation and coding efficiency, and a conditional diffusion-based generative framework that enhances the perceptual quality of reconstructed images. The second part focuses on the application of learned compression to imagery from NASA’s Solar Dynamics Observatory (SDO) mission. A learned video compression framework is developed to exploit both spatial and temporal redundancies in solar image sequences. Furthermore, an adaptive compression strategy is introduced to prioritize scientific relevance: images containing solar flare events are compressed at lower ratios to preserve critical information, whereas non-flare images are compressed more aggressively to maximize storage and transmission efficiency. Collectively, these contributions advance the field of learned image compression across both general-purpose and scientific imaging domains, providing practical solutions for improving data transmission and storage efficiency in real-world and mission-critical environments.
- Research Article
6
- 10.3390/rs17030425
- Jan 26, 2025
- Remote Sensing
In the past few years, deep learning has achieved remarkable advancements in the area of image compression. Remote sensing image compression networks focus on enhancing the similarity between the input and reconstructed images, effectively reducing the storage and bandwidth requirements for high-resolution remote sensing images. As the network’s effective receptive field (ERF) expands, it can capture more feature information across the remote sensing images, thereby reducing spatial redundancy and improving compression efficiency. However, the majority of these learned image compression (LIC) techniques are primarily CNN-based and transformer-based, often failing to balance the global ERF and computational complexity optimally. To alleviate this issue, we propose a learned remote sensing image compression network with visual state space model named VMIC to achieve a better trade-off between computational complexity and performance. Specifically, instead of stacking small convolution kernels or heavy self-attention mechanisms, we employ a 2D-bidirectional selective scan mechanism. Every element within the feature map aggregates data from multiple spatial positions, establishing a globally effective receptive field with linear computational complexity. We extend it to an omni-selective scan for the global-spatial correlations within our Channel and Global Context Entropy Model (CGCM), enabling the integration of spatial and channel priors to minimize redundancy across slices. Experimental results demonstrate that the proposed method achieves superior trade-off between rate-distortion performance and complexity. Furthermore, in comparison to traditional codecs and learned image compression algorithms, our model achieves BD-rate reductions of −4.48%, −9.80% over the state-of-the-art VTM on the AID and NWPU VHR-10 datasets, respectively, as well as −6.73% and −7.93% on the panchromatic and multispectral images of the WorldView-3 remote sensing dataset.
- Research Article
- 10.1109/tgrs.2026.3668020
- Jan 1, 2026
- IEEE Transactions on Geoscience and Remote Sensing
With the rapid advancement of remote sensing satellites toward higher spatial resolution and revisit frequency, the explosive growth of image data has posed severe challenges to the efficiency of space-to-ground transmission. Traditional onboard compression standards, such as JPEG2000, often fail to maintain satisfactory reconstruction quality under high compression ratios, limiting their applicability in large-scale remote sensing scenarios. Although learned image compression (LIC) methods have achieved remarkable improvements in rate-distortion (RD) performance, their high computational complexity hinders deployment on resource-constrained onboard platforms. To address these challenges, this paper proposes RS-LLIC, a lightweight learned image compression framework with knowledge distillation tailored for onboard remote sensing, following the “onboard encoding and ground decoding” paradigm. Specifically, an efficient encoder architecture is designed to significantly reduce onboard computational costs, while a knowledge distillation-based training strategy is introduced to guide the lightweight encoder in feature learning using a teacher model, thereby improving RD performance without incurring additional inference overhead. Experimental results on multiple remote sensing datasets demonstrate that the proposed RS-LLIC achieves superior compression performance with extremely low encoder complexity, providing an effective solution for high-quality and efficient onboard remote sensing image compression. The code will be released on https://github.com/dy196/RS-LLIC.
- Research Article
9
- 10.1016/j.neucom.2022.07.065
- Jul 22, 2022
- Neurocomputing
Successive learned image compression: Comprehensive analysis of instability
- Conference Article
355
- 10.1109/cvpr52688.2022.00563
- Jun 1, 2022
Recently, learned image compression techniques have achieved remarkable performance, even surpassing the best manually designed lossy image coders. They are promising to be large-scale adopted. For the sake of practicality, a thorough investigation of the architecture design of learned image compression, regarding both compression performance and running speed, is essential. In this paper, we first propose uneven channel-conditional adaptive coding, motivated by the observation of energy compaction in learned image compression. Combining the proposed uneven grouping model with existing context models, we obtain a spatial-channel contextual adaptive model to improve the coding performance without damage to running speed. Then we study the structure of the main transform and propose an efficient model, ELIC, to achieve state-of-the-art speed and compression ability. With superior performance, the proposed model also supports extremely fast preview decoding and progressive decoding, which makes the coming application of learning-based image compression more promising.
- Conference Article
8
- 10.1109/icassp43922.2022.9747652
- May 23, 2022
Recently, learned image compression methods have shown their outstanding rate-distortion performance when compared to traditional frameworks. Although numerous progress has been made in learned image compression, the computation cost is still at a high level. To address this problem, we propose AdderIC, which utilizes adder neural networks (AdderNet) to construct an image compression framework. According to the characteristics of image compression, we introduce several strategies to improve the performance of AdderNet in this field. Specifically, Haar Wavelet Transform is adopted to make AdderIC learn high-frequency information efficiently. In addition, implicit deconvolution with the kernel size of 1 is applied after each adder layer to reduce spatial redundancies. Moreover, we develop a novel Adder-ID-PixelShuffle cascade upsampling structure to remove checkerboard artifacts. Experiments demonstrate that our AdderIC model can largely outperform conventional AdderNet when applied in image compression and achieve comparable rate-distortion performance to that of its CNN baseline with about 80% multiplication FLOPs and 30% energy consumption reduction.
- Research Article
22
- 10.1016/j.sigpro.2022.108778
- Sep 12, 2022
- Signal Processing
Learned image compression with generalized octave convolution and cross-resolution parameter estimation