Gabic: Graph-Based Attention Block for Image Compression
While standardized codecs like JPEG and HEVC-intra represent the industry standard in image compression, neural Learned Image Compression (LIC) codecs represent a promising alternative. In detail, integrating attention mechanisms from Vision Transformers into LIC models has shown improved compression efficiency. However, extra efficiency often comes at the cost of aggregating redundant features. This work proposes a Graph-based Attention Block for Image Compression (GABIC), a method to reduce feature redundancy based on a k-Nearest Neighbors enhanced attention mechanism. Our experiments show that GABIC outperforms comparable methods, particularly at high bit rates, enhancing compression performance.
- Conference Article
245
- 10.1109/cvpr52688.2022.01697
- Jun 1, 2022
Learned image compression methods have exhibited superior rate-distortion performance than classical image compression standards. Most existing learned image compression models are based on Convolutional Neural Networks (CNNs). Despite great contributions, a main drawback of CNN based model is that its structure is not designed for capturing local redundancy, especially the nonrepetitive textures, which severely affects the reconstruction quality. Therefore, how to make full use of both global structure and local texture becomes the core problem for learning-based image compression. Inspired by recent progresses of Vision Transformer (ViT) and Swin Transformer, we found that combining the local-aware attention mechanism with the global-related feature learning could meet the expectation in image compression. In this paper, we first extensively study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block. The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models. Moreover, we propose a novel Symmetrical TransFormer (STF) framework with absolute transformer blocks in the down-sampling encoder and up-sampling decoder. Extensive experimental evaluations have shown that the proposed method is effective and outperforms the state-of-the-art methods. The code is publicly available at https://github.com/Googolxx/STF.
- Research Article
5
- 10.1109/access.2022.3195295
- Jan 1, 2023
- IEEE Access
Recently, learned image compression algorithms have shown incredible performance compared to classic hand-crafted image codecs. Despite its considerable achievements, the fundamental disadvantage is not optimized for retaining local redundancies, particularly non-repetitive patterns, which have a detrimental influence on the reconstruction quality. This paper introduces the autoencoder-style network-based efficient image compression method, which contains three novel blocks, i.e., adjacent attention block, Gaussian merge block, and decoded image refinement block, to improve the overall image compression performance. The adjacent attention block allocates the additional bits required to capture spatial correlations (both vertical and horizontal) and effectively remove worthless information. The Gaussian merge block assists the rate-distortion optimization performance, while the decoded image refinement block improves the defects in low-resolution reconstructed images. A comprehensive ablation study analyzes and evaluates the qualitative and quantitative capabilities of the proposed model. Experimental results on two publicly available datasets reveal that our method outperforms the state-of-the-art methods on the KODAK dataset (by around 4dB and 5dB) and CLIC dataset (by about 4dB and 3dB) in terms of PSNR and MS-SSIM.
- Research Article
33
- 10.1109/tnnls.2021.3104974
- Mar 1, 2023
- IEEE Transactions on Neural Networks and Learning Systems
The entropy of the codes usually serves as the rate loss in the recent learned lossy image compression methods. Precise estimation of the probabilistic distribution of the codes plays a vital role in reducing the entropy and boosting the joint rate-distortion performance. However, existing deep learning based entropy models generally assume the latent codes are statistically independent or depend on some side information or local context, which fails to take the global similarity within the context into account and thus hinders the accurate entropy estimation. To address this issue, we propose a special nonlocal operation for context modeling by employing the global similarity within the context. Specifically, due to the constraint of context, nonlocal operation is incalculable in context modeling. We exploit the relationship between the code maps produced by deep neural networks and introduce the proxy similarity functions as a workaround. Then, we combine the local and the global context via a nonlocal attention block and employ it in masked convolutional networks for entropy modeling. Taking the consideration that the width of the transforms is essential in training low distortion models, we finally produce a U-net block in the transforms to increase the width with manageable memory consumption and time complexity. Experiments on Kodak and Tecnick datasets demonstrate the priority of the proposed context-based nonlocal attention block in entropy modeling and the U-net block in low distortion situations. On the whole, our model performs favorably against the existing image compression standards and recent deep image compression models.
- Conference Article
9
- 10.1109/icme52920.2022.9859700
- Jul 18, 2022
Deep learning-based end-to-end image compression has achieved significant compression performance in recent years. However, current learning-based image compression methods are designed considering the characteristic of RGB color space, which is not suitable for image compression in YUV 420 color space because of the variance between color formats. To achieve efficient image compression in YUV 420 color space, we propose an information-preserving compression framework using the attention mechanism. Specifically, we design an information-preserving module (IPM), where we utilize the dual-branch architecture to prevent changes in data distribution and propose the feature attention block (FAB) to preserve information. Furthermore, a cross-channel progressive enhancement (CPE) network is designed by taking advantage of the relations among different channels. Ex-perimental results show that the proposed framework outper-forms state-of-the-art compression standard Versatile Video Coding (VVC) with 2.52% BD-rate reduction on common test conditions (CTC) sequences on average.
- Research Article
20
- 10.1109/tip.2023.3319275
- Jan 1, 2023
- IEEE Transactions on Image Processing
Learned image compression methods have achieved satisfactory results in recent years. However, existing methods are typically designed for RGB format, which are not suitable for YUV420 format due to the variance of different formats. In this paper, we propose an information-guided compression framework using cross-component attention mechanism, which can achieve efficient image compression in YUV420 format. Specifically, we design a dual-branch advanced information-preserving module (AIPM) based on the information-guided unit (IGU) and attention mechanism. On the one hand, the dual-branch architecture can prevent changes in original data distribution and avoid information disturbance between different components. The feature attention block (FAB) can preserve the important information. On the other hand, IGU can efficiently utilize the correlations between Y and UV components, which can further preserve the information of UV by the guidance of Y. Furthermore, we design an adaptive cross-channel enhancement module (ACEM) to reconstruct the details by utilizing the relations from different components, which makes use of the reconstructed Y as the textural and structural guidance for UV components. Extensive experiments show that the proposed framework can achieve the state-of-the-art performance in image compression for YUV420 format. More importantly, the proposed framework outperforms Versatile Video Coding (VVC) with 8.37% BD-rate reduction on common test conditions (CTC) sequences on average. In addition, we propose a quantization scheme for context model without model retraining, which can overcome the cross-platform decoding error caused by the floating-point operations in context model and provide a reference approach for the application of neural codec on different platforms.
- Research Article
- 10.26555/ijain.v10i3.1499
- Aug 31, 2024
- International Journal of Advances in Intelligent Informatics
Image compression is a crucial research topic in today's information age, especially to meet the demand for balanced data compression efficiency with the quality of the resulting image reconstruction. Common methods used for image compression nowadays are based on autoencoders with deep learning foundations. However, these methods have limitations as they only consider residual values in processed images to achieve existing compression efficiency with less satisfying reconstruction results. To address this issue, we introduce the Attention Block mechanism to improve coding efficiency even further. Additionally, we introduce post-filtering methods to enhance the final reconstruction results of images. Experimental results using two datasets, CLIC for training and KODAK for testing, demonstrate that this method outperforms several previous research methods. With an efficiency coding improvement of -28.16%, an average PSNR improvement of 34%, and an MS-SSIM improvement of 8%, the model in this study significantly enhances the rate-distortion (RD) performance compared to previous approaches.
- Research Article
5
- 10.3390/app11177803
- Aug 25, 2021
- Applied Sciences
Since high quality realistic media are widely used in various computer vision applications, image compression is one of the essential technologies to enable real-time applications. Image compression generally causes undesired compression artifacts, such as blocking artifacts and ringing effects. In this study, we propose a densely cascading image restoration network (DCRN), which consists of an input layer, a densely cascading feature extractor, a channel attention block, and an output layer. The densely cascading feature extractor has three densely cascading (DC) blocks, and each DC block contains two convolutional layers, five dense layers, and a bottleneck layer. To optimize the proposed network architectures, we investigated the trade-off between quality enhancement and network complexity. Experimental results revealed that the proposed DCRN can achieve a better peak signal-to-noise ratio and structural similarity index measure for compressed joint photographic experts group (JPEG) images compared to the previous methods.
- Research Article
30
- 10.1016/j.sigpro.2022.108589
- Apr 14, 2022
- Signal Processing
Multi‐scale spatial‐spectral attention network for multispectral image compression based on variational autoencoder
- Research Article
8
- 10.1109/access.2020.2999965
- Jan 1, 2020
- IEEE Access
Deep convolutional neural network (CNN) has made impressive achievements in the field of image restoration. However, most of deep CNN-based models have limited capability of utilizing the hierarchical features and these features are often treated equally, thus restricting the restoration performance. To address this issue, the present work proposes a novel memory-based latent attention network (MLANet) aiming to effectively restore a high-quality image from a corresponding low-quality one. The key idea of this work is to employ a memory-based latent attention block (MLAB), which is stacked in MLANet and makes better use of global and local features through the network. Specifically, the MLAB contains a main branch and a latent branch. The former is used to extract local multi-level features, and the latter preserves global information by the structure within a latent design. Furthermore, a multi-kernel attention module is incorporated into the latent branch to adaptively learn more effective features with mixed attention. To validate the effectiveness and generalization ability, MLANet is evaluated on three representative image restoration tasks: image super-resolution, image denoising, and image compression artifact reduction. Experimental results show that MLANet performs better than the state-of-the-art methods on all the tasks.
- Research Article
10
- 10.1109/joe.2023.3235058
- Jul 1, 2023
- IEEE Journal of Oceanic Engineering
A low bit-rate compression is required for underwater images due to the limited bandwidth in underwater acoustic communication, which limits performances of both machine analysis and human perception in most underwater applications. Few existing compression methods consider the unique characteristics of underwater images such as color shift and haze effect to better fulfill requirements of various applications under low bit-rates. To address this problem, we propose a novel extreme underwater image compression framework, which can provide scalability to support machine vision and human vision with the assistance of underwater priors. Specifically, the base layer is composed of a feature extractor and a generator, where global structural edges and high-level features of regions with a significant impact on machine analysis are extracted and used for reconstructing a feature-matching image for analysis purpose. Considering the negative influence of underwater imaging processes on machine vision, in this article, a feature degradation removal module guided by underwater priors is proposed to alleviate feature-level degradation via taking analysis-friendly enhanced images as auxiliary information. As for the enhancement layer aiming for human vision, the residual between the original image and the reconstruction from base layer is compressed. A feature attention block and a background light recovery block are designed utilizing features extracted from enhanced images and the underwater before further recovering the original scene with a good perception quality under low bit-rates. Experimental results demonstrate the superiority of our framework in both machine vision tasks and perception quality compared with traditional compression methods and learned-based methods.
- Research Article
- 10.1117/1.jei.34.4.043012
- Jul 8, 2025
- Journal of Electronic Imaging
Image compression plays a central role in coping with large-scale image storage and transmission. Although the traditional compression methods, such as HEVC and VVC, have made significant progress in compression performance, they still have limitations when handling complex image content. Recently, end-to-end learned image compression (LIC) methods have achieved a superior balance between bitrate and distortion through global optimization, surpassing conventional standards on several evaluation metrics. However, current LIC methods often overlook the difference in feature channels. To address this issue, we propose an end-to-end LIC framework based on a mixture attention block (MAB), which optimizes the representation of channel features. The MAB integrates orthogonal-channel spatial attention (OCSA) and Transformer-based window self-attention mechanisms, which adaptively adjust the weight of each channel to enhance both global and local feature modeling. To further enhance the spatial perception ability of the model, we introduce large kernel spatial attention in OCSA. In addition, we propose a Swin Transformer V2-based channel-wise autoregressive entropy model (S2CAEM). The S2CAEM improves the probability estimation accuracy of latent representations through a channel-wise autoregressive strategy and consequently boosts the compression efficiency. Extensive experiments show that the proposed method achieves state-of-the-art rate-distortion performance in comparison to existing LIC methods. Specifically, it outperforms VTM-14.0 by 16.76%, 15.15%, and 14.96% in Bjøntegaard-delta-rate on the Kodak, Tecnick, and CLIC Pro datasets, respectively.