Progressive Learning with Visual Prompt Tuning for Variable-Rate Image Compression
In this paper, we propose a progressive learning paradigm for transformer-based variable-rate image compression. Our approach covers a wide range of compression rates with the assistance of the Layer-adaptive Prompt Module (LPM). Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively, which are fed as additional information into the swin transformer layer of a pre-trained transformer-based image compression model to affect the allocation of attention region and the bits, which in turn changes the target compression ratio of the model. To ensure the network is more lightweight, we involves the integration of prompt networks with less convolutional layers. Exhaustive experiments show that compared to methods based on multiple models, which are optimized separately for different target rates, the proposed method arrives at the same performance with 80% savings in parameter storage and 90% savings in datasets. Meanwhile, our model outperforms all current variable bitrate image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed bitrate image compression methods trained from scratch.
- Conference Article
13
- 10.1145/3503161.3547880
- Oct 10, 2022
Learning-based methods have effectively promoted the community of image compression. Meanwhile, variational autoencoder (VAE) based variable-rate approaches have recently gained much attention to avoid the usage of a set of different networks for various compression rates. Despite the remarkable performance that has been achieved, these approaches would be readily corrupted once multiple compression/decompression operations are executed, resulting in the fact that image quality would be tremendously dropped and strong artifacts would appear. Thus, we try to tackle the issue of high-fidelity fine variable-rate image compression and propose the Invertible Activation Transformation (IAT) module. We implement the IAT in a mathematical invertible manner on a single rate Invertible Neural Network (INN) based model and the quality level (QLevel) would be fed into the IAT to generate scaling and bias tensors. IAT and QLevel together give the image compression model the ability of fine variable-rate control while better maintaining the image fidelity. Extensive experiments demonstrate that the single rate image compression model equipped with our IAT module has the ability to achieve variable-rate control without any compromise. And our IAT-embedded model obtains comparable rate-distortion performance with recent learning-based image compression methods. Furthermore, our method outperforms the state-of-the-art variable-rate image compression method by a large margin, especially after multiple re-encodings.
- Conference Article
266
- 10.1109/iccv.2019.00324
- Oct 1, 2019
In this paper, we propose a novel variable-rate learned image compression framework with a conditional autoencoder. Previous learning-based image compression methods mostly require training separate networks for different compression rates so they can yield compressed images of varying quality. In contrast, we train and deploy only one variable-rate image compression network implemented with a conditional autoencoder. We provide two rate control parameters, i.e., the Lagrange multiplier and the quantization bin size, which are given as conditioning variables to the network. Coarse rate adaptation to a target is performed by changing the Lagrange multiplier, while the rate can be further fine-tuned by adjusting the bin size used in quantizing the encoded representation. Our experimental results show that the proposed scheme provides a better rate-distortion trade-off than the traditional variable-rate image compression codecs such as JPEG2000 and BPG. Our model also shows comparable and sometimes better performance than the state-of-the-art learned image compression models that deploy multiple networks trained for varying rates.
- Conference Article
21
- 10.1109/icassp.2017.7952409
- Mar 1, 2017
This paper addresses the problem of image compression using sparse representations. We propose a variant of autoencoder called Stochastic Winner-Take-All Auto-Encoder (SWTA AE). “Winner-Take-All” means that image patches compete with one another when computing their sparse representation and “Stochastic” indicates that a stochastic hyperparameter rules this competition during training. Unlike auto-encoders, SWTA AE performs variable rate image compression for images of any size after a single training, which is fundamental for compression. For comparison, we also propose a variant of Orthogonal Matching Pursuit (OMP) called Winner-Take-All Orthogonal Matching Pursuit (WTA OMP). In terms of rate-distortion trade-off, SWTA AE outperforms auto-encoders but it is worse than WTA OMP. Besides, SWTA AE can compete with JPEG in terms of rate-distortion.
- Conference Article
7
- 10.1109/icassp49357.2023.10095427
- Jun 4, 2023
Variable-rate mechanism has improved the flexibility and efficiency of learning-based image compression that trains multiple models for different rate-distortion tradeoffs. One of the most common approaches for variable-rate is to channel- wisely or spatial-uniformly scale the internal features. However, the diversity of spatial importance is instructive for bit allocation of image compression. In this paper, we introduce a Spatial Importance Guided Variable-rate Image Compression (SigVIC), in which a spatial gating unit (SGU) is designed for adaptively learning a spatial importance mask. Then, a spatial scaling network (SSN) takes the spatial importance mask to guide the feature scaling and bit allocation for variablerate. Moreover, to improve the quality of decoded image, Top-K shallow features are selected to refine the decoded features through a shallow feature fusion module (SFFM). Experiments show that our method outperforms other learning- based methods (whether variable-rate or not) and traditional codecs, with storage saving and high flexibility.
- Conference Article
34
- 10.1109/cvprw50498.2020.00069
- Jun 1, 2020
In this paper, we propose a variable rate image compression framework for low bit-rate image compression task. Unlike most of the variational auto-encoder (VAE) based methods, our proposal is able to achieve continuously variable rate in a single model by introducing a pair of gain units into VAE. Besides, a content adaptive optimization is applied to adapt the latent representation to the specific content while keeping the parameters of the network and the predictive model fixed. After that, due to the variable rate characteristics of our method, each image can be compressed into any quality level through a unified codec. Finally, an efficient rate control algorithm is designed to find the optimal bit allocation scheme under the constraint of the low rate challenge.
- Conference Article
3
- 10.1109/cvprw56347.2022.00179
- Jun 1, 2022
The recent success of self-supervised learning relies on its ability to learn the representations from self-defined pseudo-labels that are applied to several downstream tasks. Motivated by this ability, we present a deep image compression technique, which learns the lossy reconstruction of raw images from the self-supervised learned representation of SimCLR ResNet-50 architecture. Our framework uses a feature pyramid to achieve the variable rate compression of the image using a self-attention map for the optimal allocation of bits. The paper provides an overview to observe the effects of contrastive self-supervised representations and the self-attention map on the distortion and perceptual quality of the reconstructed image. The experiments are performed on a different class of images to show that the proposed method outperforms the other variable rate deep compression models without compromising the perceptual quality of the images.
- Research Article
26
- 10.1109/tmm.2021.3068523
- Jan 1, 2021
- IEEE Transactions on Multimedia
Recently deep learning-based image compression has shown the potential to outperform traditional codecs. However, most existing methods train multiple networks for multiple bit rates, which increase the implementation complexity. In this paper, we propose a new variable-rate image compression framework, which employs generalized octave convolutions (GoConv) and generalized octave transposed-convolutions (GoTConv) with built-in generalized divisive normalization (GDN) and inverse GDN (IGDN) layers. Novel GoConv- and GoTConv-based residual blocks are also developed in the encoder and decoder networks. Our scheme also uses a stochastic rounding-based scalar quantization. To further improve the performance, we encode the residual between the input and the reconstructed image from the decoder network as an enhancement layer. To enable a single model to operate with different bit rates and to learn multi-rate image features, a new objective function is introduced. Experimental results show that the proposed framework trained with variable-rate objective function outperforms the standard codecs such as H.265/HEVC-based BPG and state-of-the-art learning-based variable-rate methods.
- Research Article
226
- 10.1109/78.150005
- Jan 1, 1992
- IEEE Transactions on Signal Processing
High-quality variable-rate image compression is achieved by segmenting an image into regions of different sizes, classifying each region into one of several perceptually distinct categories, and using a distinct coding procedure for each category. Segmentation is performed with a quadtree data structure by isolating the perceptually more important areas of the image into small regions and separately identifying larger random texture blocks. Since the important regions have been isolated, the remaining parts of the image can be coded at a lower rate than would be otherwise possible. High-quality coding results are achieved at rates between 0.35 and 0.7 b/p depending on the nature of the original image, and satisfactory results have been obtained at 0.25 b/p.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
- Conference Article
30
- 10.1109/icme46284.2020.9102877
- Jun 2, 2020
Recently deep learning-based image compression has shown the potential to outperform traditional codecs. However, most existing methods train multiple networks for multiple bit rates, which increases the implementation complexity. In this paper, we propose a variable-rate image compression framework, which employs more Generalized Divisive Normalization (GDN) layers than previous GDN-based methods. Novel GDN-based residual sub-networks are also developed in the encoder and decoder networks. Our scheme also uses a stochastic rounding-based scalar quantization. To further improve the performance, we encode the residual between the input and the reconstructed image from the decoder network as an enhancement layer. To enable a single model to operate with different bit rates and to learn multi-rate image features, a new objective function is introduced. Experimental results show that the proposed framework trained with variable-rate objective function outperforms all standard codecs such as H.265/HEVC-based BPG and state-of-the-art learning-based variable-rate methods.
- Research Article
10
- 10.1109/tpami.2024.3356557
- Jun 1, 2024
- IEEE Transactions on Pattern Analysis and Machine Intelligence
Lossy image compression is a fundamental technology in media transmission and storage. Variable-rate approaches have recently gained much attention to avoid the usage of a set of different models for compressing images at different rates. During the media sharing, multiple re-encodings with different rates would be inevitably executed. However, existing Variational Autoencoder (VAE)-based approaches would be readily corrupted in such circumstances, resulting in the occurrence of strong artifacts and the destruction of image fidelity. Based on the theoretical findings of preserving image fidelity via invertible transformation, we aim to tackle the issue of high-fidelity fine variable-rate image compression and thus propose the Invertible Continuous Codec (I2C). We implement the I2C in a mathematical invertible manner with the core Invertible Activation Transformation (IAT) module. I2C is constructed upon a single-rate Invertible Neural Network (INN) based model and the quality level (QLevel) would be fed into the IAT to generate scaling and bias tensors. Extensive experiments demonstrate that the proposed I2C method outperforms state-of-the-art variable-rate image compression methods by a large margin, especially after multiple continuous re-encodings with different rates, while having the ability to obtain a very fine variable-rate control without any performance compromise.
- Conference Article
1
- 10.5220/0012273900003807
- Jan 1, 2023
Variable Rate Image Compression Based Adaptive Data Transfer Algorithm for Underwater Wireless Sensor Networks
- Conference Article
1
- 10.1109/acssc.1990.523352
- Nov 5, 1990
Techniques for clustering and the design of decision trees have bcen combined recently to produce codcs. These tree-structured codes are efficient and easy to implement for problems of variable rate image compression. This paper is a summary of some techniques for the resulting vector quantizers, which are explained in the context of designing decision trees. We describe how to grow large trees by splitting nodes individually, and how to prune these large trees by an algorithin termed tiic generalized RFOS algorithm. Estimation based on an independent test sample and on crossvalidation both figure in pruning algorithms.
- Conference Article
13
- 10.1109/cvprw50498.2020.00089
- Apr 15, 2020
Deep learning based image compression methods have achieved superior performance compared with transform based conventional codec. With end-to-end Rate-Distortion Optimization (RDO) in the codec, compression model is optimized with Lagrange multiplier λ. For conventional codec, signal is decorrelated with orthonormal transformation, and uniform quantizer is introduced. We propose a variable rate image compression method with dead-zone quantizer. Firstly, the autoencoder network is trained with RaDOGAGA [6] framework, which can make the latents isometric to the metric space, such as SSIM and MSE. Then the conventional dead-zone quantization method with arbitrary step size is used in the common trained network to provide the flexible rate control. With dead-zone quantizer, the experimental results show that our method performs comparably with independently optimized models within a wide range of bitrate.
- Research Article
2
- 10.1016/j.patrec.2004.09.046
- Nov 11, 2004
- Pattern Recognition Letters
Adaptive modulated wavelet subband image coding
- Conference Article
4
- 10.1145/3595916.3626444
- Dec 6, 2023
Recently, neural network-based image compression techniques have demonstrated remarkable compression performance. The use of context-adaptive entropy models greatly enhances the rate-distortion (R-D) performance by effectively capturing spatial redundancy in latent representations. However, latent representations still contain some spatial correlations(e.g. same spatial structure), it needs to be eliminated by further processing. And many compression models are single-rate model, which is difficult to cover a big range of bitrate. In order to address this issue, we propose a novel variable-rate image compression algorithm that efficiently leverages bi-resolution spatial-channel information through learned mechanisms. In this paper, we first proposed a BRP network to divide our latent representations and side information into HR and LR components, eliminating the spatial redundancy in same location. Combining the spatial-channel context, we proposed a BSC context model, including a decreasing-granularity checkerboard pattern and channel grouping based on cosine slicing strategy. To cover a wide range of bitrate, we take a weight map as input to control bit allocation, achieving multiple compression rates. Our experimental results show that our method provides a better rate-distortion trade-off than BPG, JPEG and other recent image compression methods based on deep learning.