Optimized Decoupled Structure with Non-Local Attention for Deep Image Compression

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Recently, a decoupled framework for learning-based image compression has been proposed and adopted into the JPEG AI image coding standard developed by ISO/IEC WG1. The decoupled structure disentangles the sample reconstruction process and the entropy decoding process, making the decoding extremely fast. The corresponding techniques constitute the essential parts of the JPEG AI verification model software. However, its analysis transform and synthesis transform are relatively simple, which are built with stacked convolution layers, thereby may lack the capability to interpret data correlations. In this work, we enhance the transform networks by introducing the non-local attention mechanism, which has proven efficient in image compression tasks. The proposed framework thus shares the merits of the fast decoding from the decoupled architecture and the strong transform capabilities from the non-local attention, making it a stronger candidate for practical end-to-end image codec deployment. Experimental results on the Kodak test set and JPEG AI CfP test set show that our method achieves better BDRate performance compared to the original Decoupled-anchor and significantly faster decoding speed compared to NIC. The proposed solution has been adopted by the IEEE 1857.11 Working Subgroup (1857.11 WSG) in developing neural network-based image coding standards in the 10th Meeting.

Similar Papers
  • Conference Article
  • 10.1109/dcc52660.2022.00010
Improved Deep Image Compression with Joint Optimization of Cross Channel Context Model And Generalized Loop Filter
  • Mar 1, 2022
  • Changyue Ma + 3 more

Among the recent deep image compression frameworks, transform coding together with a context-adaptive entropy model is the most representative approach to achieve the best coding performance. For entropy model, 2D mask convolution is widely utilized to capture the spatial context, which omits the correlations along channel dimension. To complement to the spatial context, a cross channel context model is proposed. For transform, if given more network layers to improve its representation ability, how to allocate these network layers in forward and inverse transform is investigated. After analyzing the scheme of deep image compression connected with loop filter, we find this investigation can be regarded as a more generalized loop filter. The proposed cross channel context model and generalized loop filter (CCCMGLF) are integrated into the deep image compression framework and jointly optimized to improve the coding performance. Experimental results demonstrate that, using PSNR as distortion metric, the proposed CCCMGLF outperforms VTM-11.0 by 1.20%, 10.82% and 5.38% in terms of BD-rate reductions for Y, U and V components, respectively, for the Kodak dataset. For the JVET CTC sequences, the proposed method outperforms VTM-11.0 by 1.44% for Y but has a coding performance loss of 24.74% and 11.91% for U and V, respectively. Over the baseline deep compression framework, the proposed method provides 7.80%, 12.66% and 11.15% performance improvement for Y, U, and V, respectively, for the Kodak dataset; 9.10%, 12.27%, and 12.68% performance improvement for Y, U and V, respectively, for the JVET CTC sequences. The proposed approaches are applicable in both image compression and intra coding in video compression.

  • Research Article
  • Cite Count Icon 5
  • 10.1002/int.22769
Deep image compression with lifting scheme: Wavelet transform domain based on high‐frequency subband prediction
  • Dec 8, 2021
  • International Journal of Intelligent Systems
  • M I Anju + 1 more

Image compression is the most important image processing method extensively deployed in different appliances. “Discrete wavelet transform (DWT)” is one of the well-adopted transforming methods exploited for compressing images. The extremely deployed version of DWT is convolution-oriented. Nevertheless, the lifting-oriented DWT scheme requires more contemplation on more proficient performance and lesser computation cost. This paper intends to propose a deep learning-based image compression model with a lifting scheme for predicting high-frequency subbands. Moreover, the fine-tuning in lifting factorization is done by a new Sea Lion with Averaged Update Evaluation that includes new cosine estimation under the COordinate Rotation DIgital Computer algorithm. Similarly, this study defines a new single objective function that merges the multiconstraints, like, “Peak Signal to Noise Ratio (PSNR) and Compression Ratio (CR)”. At last, the supremacy of the presented approach is proved with respect to varied measures, like, CR, PSNR and so on.

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.jvcir.2021.103226
Deep image compression with multi-stage representation
  • Jul 21, 2021
  • Journal of Visual Communication and Image Representation
  • Zixi Wang + 3 more

Deep image compression with multi-stage representation

  • Research Article
  • Cite Count Icon 24
  • 10.1109/tip.2023.3251020
CBANet: Toward Complexity and Bitrate Adaptive Deep Image Compression Using a Single Network.
  • Jan 1, 2023
  • IEEE Transactions on Image Processing
  • Jinyang Guo + 2 more

In this work, we propose a new deep image compression framework called Complexity and Bitrate Adaptive Network (CBANet) that aims to learn one single network to support variable bitrate coding under various computational complexity levels. In contrast to the existing state-of-the-art learning-based image compression frameworks that only consider the rate-distortion trade-off without introducing any constraint related to the computational complexity, our CBANet considers the complex rate-distortion-complexity trade-off when learning a single network to support multiple computational complexity levels and variable bitrates. Since it is a non-trivial task to solve such a rate-distortion-complexity related optimization problem, we propose a two-step approach to decouple this complex optimization task into a complexity-distortion optimization sub-task and a rate-distortion optimization sub-task, and additionally propose a new network design strategy by introducing a Complexity Adaptive Module (CAM) and a Bitrate Adaptive Module (BAM) to respectively achieve the complexity-distortion and rate-distortion trade-offs. As a general approach, our network design strategy can be readily incorporated into different deep image compression methods to achieve complexity and bitrate adaptive image compression by using a single network. Comprehensive experiments on two benchmark datasets demonstrate the effectiveness of our CBANet for deep image compression. Code is released at https://github.com/JinyangGuo/CBANet-release.

  • Research Article
  • Cite Count Icon 14
  • 10.1016/j.jvcir.2022.103573
Deep image compression based on multi-scale deformable convolution
  • Jul 6, 2022
  • Journal of Visual Communication and Image Representation
  • Daowen Li + 3 more

Deep image compression based on multi-scale deformable convolution

  • Research Article
  • Cite Count Icon 16
  • 10.1109/tip.2024.3504282
Saliency Segmentation Oriented Deep Image Compression with Novel Bit Allocation.
  • Jan 1, 2025
  • IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
  • Yuan Li + 3 more

Image compression distortion can cause performance degradation of machine analysis tasks, therefore recent years have witnessed fast progress in developing deep image compression methods optimized for machine perception. However, the investigation still lacks for saliency segmentation. First, in this paper we propose a deep compression network increasing local signal fidelity of important image pixels for saliency segmentation, which is different from existing methods utilizing the analysis network loss for backward propagation. By this means, these two types of networks can be decoupled to improve the compatibility of proposed compression method for diverse saliency segmentation networks. Second, pixel-level bit weights are modeled with probability distribution in the proposed bit allocation method. The ascending cosine roll-down (ACRD) function allocates bits to those important pixels, which fits the essence that saliency segmentation can be regarded as the pixel-level bi-classification task. Third, the compression network is trained without the help of saliency segmentation, where latent representations are decomposed into base and enhancement channels. Base channels are retained in the whole image, while enhancement channels are utilized only for important pixels, and therefore more bits can benefit saliency segmentation via enhancement channels. Extensive experimental results demonstrate that the proposed method can save an average of 10.34% bitrate compared with the state-of-the-art deep image compression method, where the rate-accuracy (R-A) performances are evaluated on sixteen downstream saliency segmentation networks with five conventional SOD datasets.

  • Conference Article
  • Cite Count Icon 25
  • 10.1109/icip.2018.8451411
Deep Image Compression with Iterative Non-Uniform Quantization
  • Oct 1, 2018
  • Jianrui Cai + 1 more

Image compression, which aims to represent an image with less storage space, is a classical problem in image processing. Recently, by training an encoder-quantizer-decoder network, deep convolutional neural networks (CNNs) have achieved promising results in image compression. As a nondifferentiable part of the compression system, quantizer is hard to be updated during the network training. Most of existing deep image compression methods adopt a uniform rounding function as the quantizer, which however restricts the capability and flexibility of CNNs in compressing complex image structures. In this paper, we present an iterative nonuniform quantization scheme for deep image compression. More specifically, we alternatively optimize the quantizer and encoder-decoder. When the encoder-decoder is fixed, a non-uniform quantizer is optimized based on the distribution of representation features. The encoder-decoder network is then updated by fixing the quantizer. Extensive experiments demonstrate the superior PSNR index of the proposed method to existing deep compressors and JPEG2000.

  • Research Article
  • Cite Count Icon 66
  • 10.1109/tcsvt.2022.3199472
Joint Graph Attention and Asymmetric Convolutional Neural Network for Deep Image Compression
  • Jan 1, 2023
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Zhisen Tang + 5 more

Recent deep image compression methods have achieved prominent progress by using nonlinear modeling and powerful representation capabilities of neural networks. However, most existing learning-based image compression approaches employ customized convolutional neural network (CNN) to utilize visual features by treating all pixels equally, neglecting the effect of local key features. Meanwhile, the convolutional filters in CNN usually express the local spatial relationship within the receptive field and seldom consider the long-range dependencies from distant locations. This results in the long-range dependencies of latent representations not being fully compressed. To address these issues, an end-to-end image compression method is proposed by integrating graph attention and asymmetric convolutional neural network (ACNN). Specifically, ACNN is used to strengthen the effect of local key features and reduce the cost of model training. Graph attention is introduced into image compression to address the bottleneck problem of CNN in modeling long-range dependencies. Meanwhile, regarding the limitation that existing attention mechanisms for image compression hardly share information, we propose a self-attention approach which allows information flow to achieve reasonable bit allocation. The proposed self-attention approach is in compliance with the perceptual characteristics of human visual system, as information can interact with each other via attention modules. Moreover, the proposed self-attention approach takes into account channel-level relationship and positional information to promote the compression effect of rich-texture regions. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performances after being optimized by MS-SSIM compared to recent deep compression models on the benchmark datasets of Kodak and Tecnick. The project page with the source code can be found in <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://mic.tongji.edu.cn</uri> .

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icoin50884.2021.9333956
Learned Image Compression with Frequency Domain Loss
  • Jan 13, 2021
  • Soonbin Lee + 3 more

This paper proposes an end-to-end deep image compression model with a frequency domain loss function. Unlike previous deep image compression methods, the model is computed jointly in the frequency domain. By calculating in the frequency domain, the model incorporates high-frequency components to capture detailed information in the reconstructed images effectively. The process of frequency domain relates to the compression technologies, a concept universal to modern image/video codecs (e.g., JPEG), but it has seldom been investigated in a deep image compression model based on neural networks. It was demonstrated that this model shows better image compression performance when measuring visual quality using the peak signal-to-noise ratio, and its rate-distortion performance outperformed traditional neural-network-based models when the model was trained jointly in the frequency domain. This model improves the performance of image compression, especially when the bitrate was low. Moreover, the method can be used and applicable to other compression models easily.

  • Conference Article
  • Cite Count Icon 10
  • 10.1145/3394171.3413680
Instability of Successive Deep Image Compression
  • Oct 12, 2020
  • Jun-Hyuk Kim + 3 more

Successive image compression refers to the process of repeated encoding and decoding of an image. It frequently occurs during sharing, manipulation, and re-distribution of images. While deep learning-based methods have made significant progress for single-step compression, thorough analysis of their performance under successive compression has not been conducted. In this paper, we conduct comprehensive analysis of successive deep image compression. First, we introduce a new observation, instability of successive deep image compression, which is not observed in JPEG, and discuss causes of the instability. Then, we conduct a successive image compression benchmark for the state-of-the-art deep learning-based methods, and analyze the factors that affect the instability in a comparative manner. Finally, we propose a new loss function for training deep compression models, called feature identity loss, to mitigate the instability of successive deep image compression.

  • Conference Article
  • Cite Count Icon 5
  • 10.1117/12.2593500
A study of deep image compression for YUV420 color space
  • Aug 1, 2021
  • Changyue Ma + 3 more

Currently, most deep image compression methods are designed to compress images in RGB color space. However, there are also many images in YUV420 color space and the video coding standards such as H.265/HEVC and H.266/VVC support compression of images in YUV420 color space with their respective Main Still Picture profiles. In this paper, we first study how to adjust the deep compression frameworks designed for images in RGB color space to compress images in YUV420 color space. Then, we study the coding performance impact when we adjust the training distortion weight for YUV channels and compare the experimental results with HEVC and VVC all intra configuration. The proposed approaches are applicable in both image compression and intra coding in video compression.

  • Conference Article
  • Cite Count Icon 15
  • 10.1109/wacv56688.2023.00256
Universal Deep Image Compression via Content-Adaptive Optimization with Adapters
  • Jan 1, 2023
  • Koki Tsubota + 2 more

Deep image compression performs better than conventional codecs, such as JPEG, on natural images. However, deep image compression is learning-based and en-counters a problem: the compression performance deteriorates significantly for out-of-domain images. In this study, we highlight this problem and address a novel task: universal deep image compression. This task aims to compress images belonging to arbitrary domains, such as natural images, line drawings, and comics. To address this problem, we propose a content-adaptive optimization framework; this framework uses a pre-trained compression model and adapts the model to a target image during compression. Adapters are inserted into the decoder of the model. For each input image, our framework optimizes the latent representation extracted by the encoder and the adapter parameters in terms of rate-distortion. The adapter parameters are additionally transmitted per image. For the experiments, a benchmark dataset containing uncompressed images of four domains (natural images, line drawings, comics, and vector arts) is constructed and the proposed universal deep compression is evaluated. Finally, the proposed model is compared with non-adaptive and existing adaptive compression models. The comparison reveals that the proposed model outperforms these. The code and dataset are publicly available at https://github.com/kktsubota/universal-dic.

  • Research Article
  • 10.3390/app152412882
Noise-Aware Hybrid Compression of Deep Models with Zero-Shot Denoising and Failure Prediction
  • Dec 5, 2025
  • Applied Sciences
  • Lizhe Zhang + 5 more

Deep learning-based image compression achieves remarkable average rate-distortion performance but is prone to failure on noisy, high-frequency, or high-entropy inputs. This work systematically investigates these failure cases and proposes a noise-aware hybrid compression framework to address them. A High-Frequency Vulnerability Index (HFVI) is proposed, integrating frequency energy, encoder Jacobian sensitivity, and texture entropy into a unified measure of degradation susceptibility. Guided by HFVI, the system incorporates a selective zero-shot denoising module (P2PA) and a lightweight hybrid codec selector that determines, for each image, whether P2PA is necessary and selecting the more reliable codec (a learning-based model or JPEG2000) accordingly, without retraining any compression backbones. Experiments span a 200,000-image cross-domain benchmark incorporating general datasets, synthetic noise (eight levels), and real-noise datasets demonstrate that the proposed pipeline improves PSNR by up to 1.28 dB, raises SSIM by 0.02, reduces LPIPS by roughly 0.05, and decreases the failure-case rate by 6.7% over the best baseline (Joint-IC). Additional intensity-profile and cross-validation analyses further validate the robustness and deployment readiness of the method, showing that the hybrid selector provides a practical path toward reliable, noise-adaptive deep image compression.

  • Research Article
  • Cite Count Icon 10
  • 10.1109/access.2023.3236086
Comprehensive Comparisons of Uniform Quantization in Deep Image Compression
  • Jan 1, 2023
  • IEEE Access
  • Koki Tsubota + 1 more

In deep image compression, uniform quantization is applied to latent representations obtained by using an auto-encoder architecture for reducing bits and entropy coding. Quantization is a problem encountered in the end-to-end training of deep image compression. Quantization&#x2019;s gradient is zero, and it cannot backpropagate meaningful gradients. Many methods have been proposed to address the approximations of quantization to obtain gradients. However, there have not been equitable comparisons among them. In this study, we comprehensively compare the existing approximations of uniform quantization. Furthermore, we evaluate possible combinations of quantizers for the decoder and the entropy model, as the approximated quantizers can be different for them. We conduct experiments using three network architectures on two test datasets. The experimental results reveal that the best approximated quantization differs by the network architectures, and the best approximations of the three are different from the original ones used for the architectures. We also show that the combination of quantizers that uses universal quantization for the entropy model and differentiable soft quantization for the decoder is a comparatively good choice for different architectures and datasets.

  • Research Article
  • Cite Count Icon 20
  • 10.1109/jsac.2022.3221998
Progressive Deep Image Compression for Hybrid Contexts of Image Classification and Reconstruction
  • Jan 1, 2023
  • IEEE Journal on Selected Areas in Communications
  • Zhongyue Lei + 5 more

Progressive deep image compression (DIC) with hybrid contexts is an under-investigated problem that aims to jointly maximize the utility of a compressed image for multiple contexts or tasks under variable rates. In this paper, we consider the contexts of image reconstruction and classification. We propose a DIC framework, called residual-enhanced mask-based progressive generative coding (RMPGC), designed for explicit control of the performance within the rate-distortion-classification-perception (RDCP) trade-off. Three independent mechanisms are introduced to yield a semantically structured latent representation that can support parameterized control of rate and context adaptation. Experimental results show that the proposed RMPGC outperforms a benchmark DIC scheme using the same generative adversarial nets (GANs) backbone in all six metrics related to classification, distortion, and perception. Moreover, RMPGC is a flexible framework that can be applied to different neural network backbones. Some typical implementations are given and shown to outperform the classic BPG codec and four state-of-the-art DIC schemes in classification and perception metrics, with a slight degradation in distortion metrics. Our proposal of a nonlinear-neural-coded and richly structured latent space makes the proposed DIC scheme well suited for image compression in wireless communications, multi-user broadcasting, and multi-tasking applications.

Save Icon
Up Arrow
Open/Close