Related Topics
Articles published on learned-image-compression
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
124 Search results
Sort by Recency
- Research Article
8
- 10.1109/tmm.2024.3416831
- Jan 1, 2024
- IEEE Transactions on Multimedia
- Wei Jiang + 5 more
The effective receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed during transform and how many spatial priors can be utilized to synthesize textures during inverse transform.Existing methods rely on stacks of small kernels, whose ERFs remain insufficiently large, or heavy non-local attention mechanisms, which limit the potential of high-resolution image coding.To tackle this issue, we propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression (LLIC).Specifically, for the first time in the learned image compression community, we introduce a few large kernelbased depth-wise convolutions to reduce more redundancy while maintaining modest complexity.Due to the wide range of image diversity, we further propose a mechanism to augment convolution adaptability through the self-conditioned generation of weights.The large kernels cooperate with non-linear embedding and gate mechanisms for better expressiveness and lighter pointwise interactions.Our investigation extends to refined training methods that unlock the full potential of these large kernels.Moreover, to promote more dynamic inter-channel interactions, we introduce an adaptive channel-wise bit allocation strategy that autonomously generates channel importance factors in a self-conditioned manner.To demonstrate the effectiveness of the proposed transform coding, we align the entropy model to compare with existing transform methods and obtain models LLIC-STF, LLIC-ELIC, and LLIC-TCM.Extensive experiments demonstrate that our proposed LLIC models have significant improvements over the corresponding baselines and reduce the BD-Rate by 9.49%, 9.47%, 10.94% on Kodak over VTM-17.0Intra, respectively.Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity.
- Research Article
4
- 10.1109/lsp.2024.3411524
- Jan 1, 2024
- IEEE Signal Processing Letters
- Farhad Pakdaman + 1 more
The emerging Learned Compression (LC) replaces the traditional codec modules with Deep Neural Networks (DNN), which are trained end-to-end for rate-distortion performance. This approach is considered as the future of image/video compression, and major efforts have been dedicated to improving its compression efficiency. However, most proposed works target compression efficiency by employing more complex DNNS, which contributes to higher computational complexity. Alternatively, this paper proposes to improve compression by fully exploiting the existing DNN capacity. To do so, the latent features are guided to learn a richer and more diverse set of features, which corresponds to better reconstruction. A channel-wise feature decorrelation loss is designed and is integrated into the LC optimization. Three strategies are proposed and evaluated, which optimize (1) the transformation network, (2) the context model, and (3) both networks. Experimental results on two established LC methods show that the proposed method improves the compression with a BD-Rate of up to 8.06%, with no added complexity. The proposed solution can be applied as a plug-and-play solution to optimize any similar LC method.
- Research Article
13
- 10.1109/tip.2024.3445737
- Jan 1, 2024
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
- Haisheng Fu + 6 more
Deep learning-based image compression has made great progresses recently. However, some leading schemes use serial context-adaptive entropy model to improve the rate-distortion (R-D) performance, which is very slow. In addition, the complexities of the encoding and decoding networks are quite high and not suitable for many practical applications. In this paper, we propose four techniques to balance the trade-off between the complexity and performance. We first introduce the deformable residual module to remove more redundancies in the input image, thereby enhancing compression performance. Second, we design an improved checkerboard context model with two separate distribution parameter estimation networks and different probability models, which enables parallel decoding without sacrificing the performance compared to the sequential context-adaptive model. Third, we develop a three-pass knowledge distillation scheme to retrain the decoder and entropy coding, and reduce the complexity of the core decoder network, which transfers both the final and intermediate results of the teacher network to the student network to improve its performance. Fourth, we introduce L1 regularization to make the numerical values of the latent representation more sparse, and we only encode non-zero channels in the encoding and decoding process to reduce the bit rate. This also reduces the encoding and decoding time. Experiments show that compared to the state-of-the-art learned image coding scheme, our method can be about 20 times faster in encoding and 70-90 times faster in decoding, and our R-D performance is also 2.3% higher. Our method achieves better rate-distortion performance than classical image codecs including H.266/VVC-intra (4:4:4) and some recent learned methods, as measured by both PSNR and MS-SSIM metrics on the Kodak and Tecnick-40 datasets.
- Research Article
4
- 10.1016/j.dsp.2023.104339
- Dec 5, 2023
- Digital Signal Processing
- Jianxu Wang + 1 more
A region-based hierarchical image compression method with simulated visual perception
- Research Article
7
- 10.1016/j.engappai.2023.107596
- Dec 4, 2023
- Engineering Applications of Artificial Intelligence
- Bo Li + 6 more
Learned image compression via neighborhood-based attention optimization and context modeling with multi-scale guiding
- Research Article
5
- 10.1109/tcsvt.2023.3273578
- Dec 1, 2023
- IEEE Transactions on Circuits and Systems for Video Technology
- Fatih Kamisli
With the increasing popularity of deep learning in image processing, many learned lossless image compression methods have been proposed recently. One group of algorithms are based on scale-based auto-regressive models and can provide competitive compression performance while also allowing easily parallelized computations and short encoding/decoding times. However, they use large neural networks and have high computational requirements. This paper presents an interpolation based learned lossless image compression method which falls in the scale-based auto-regressive models group. The method achieves compression performance better than or on par with the recent scale-based auto-regressive models, yet requires more than 10x less neural network parameters (0.19M) and encoding/decoding computation complexity. These achievements are due to the contributions/findings in the overall system and neural network architecture design, such as sharing interpolator neural networks across different scales, using separate neural networks for different parameters of the probability distribution model and performing the processing in the YCoCg-R color space instead of the RGB color space.
- Research Article
38
- 10.1109/tcsvt.2023.3276442
- Dec 1, 2023
- IEEE Transactions on Circuits and Systems for Video Technology
- Tong Chen + 1 more
Deep neural network-based image compression has been extensively studied. However, the model robustness which is crucial to practical application is largely overlooked. We propose to examine the robustness of prevailing learned image compression models by injecting negligible adversarial perturbation into the original source image. Severe distortion in decoded reconstruction reveals the general vulnerability in existing methods regardless of their settings (e.g., network architecture, loss function, quality scale). A variety of defense strategies including geometric self-ensemble based pre-processing, and adversarial training, are investigated against the adversarial attack to improve the model's robustness. Later the defense efficiency is further exemplified in real-life image recompression case studies. Overall, our methodology is simple, effective, and generalizable, making it attractive for developing robust learned image compression solutions. All materials are made publicly accessible at https://njuvision.github.io/RobustNIC for reproducible research.
- Research Article
5
- 10.1016/j.jvcir.2023.103990
- Nov 24, 2023
- Journal of Visual Communication and Image Representation
- Yang Sui + 6 more
Corner-to-Center long-range context model for efficient learned image compression
- Research Article
44
- 10.1109/tcsvt.2023.3237274
- Aug 1, 2023
- IEEE Transactions on Circuits and Systems for Video Technology
- Haisheng Fu + 5 more
Recently, deep learning-based image compression has made significant progresses, and has achieved better rate-distortion (R-D) performance than the latest traditional method, H.266/VVC, in both MS-SSIM metric and the more challenging PSNR metric. However, a major problem is that the complexities of many leading learned schemes are too high. In this paper, we propose an efficient and effective image coding framework, which achieves similar R-D performance with lower complexity than the state of the art. First, we develop an improved multi-scale residual block (MSRB) that can expand the receptive field and capture global information more efficiently, which further reduces the spatial correlation of the latent representations. Second, an importance scaling network is introduced to directly scale the latents to achieve content-adaptive bit allocation without sending side information, which is more flexible than previous importance map methods. Third, we apply a post-quantization filter (PQF) to reduce the quantization error, motivated by the Sample Adaptive Offset (SAO) filter in video coding. Moreover, our experiments show that the performance of the system is less sensitive to the complexity of the decoder. Therefore, we design an asymmetric paradigm, in which the encoder employs three stages of MSRBs to improve the learning capacity, whereas the decoder only uses one stage of MSRB, which reduces the decoder complexity and still yields satisfactory performance. Experimental results show that compared to the state-of-the-art method, the encoding and decoding time of the proposed method are about 17 times faster, and the R-D performance is only reduced by about 1% on both Kodak and Tecnick-40 datasets, which is still better than H.266/VVC(4:4:4) and other leading learning-based methods. Our source code is publicly available at https://github.com/fengyurenpingsheng.
- Research Article
18
- 10.1609/aaai.v37i1.25184
- Jun 26, 2023
- Proceedings of the AAAI Conference on Artificial Intelligence
- Xuhao Jiang + 4 more
Image-based single-modality compression learning approaches have demonstrated exceptionally powerful encoding and decoding capabilities in the past few years , but suffer from blur and severe semantics loss at extremely low bitrates. To address this issue, we propose a multimodal machine learning method for text-guided image compression, in which the semantic information of text is used as prior information to guide image compression for better compression performance. We fully study the role of text description in different components of the codec, and demonstrate its effectiveness. In addition, we adopt the image-text attention module and image-request complement module to better fuse image and text features, and propose an improved multimodal semantic-consistent loss to produce semantically complete reconstructions. Extensive experiments, including a user study, prove that our method can obtain visually pleasing results at extremely low bitrates, and achieves a comparable or even better performance than state-of-the-art methods, even though these methods are at 2x to 4x bitrates of ours.
- Research Article
8
- 10.1016/j.sigpro.2023.109128
- Jun 3, 2023
- Signal Processing
- Youneng Bao + 5 more
Taylor series based dual-branch transformation for learned image compression
- Research Article
16
- 10.1109/tcsvt.2022.3229701
- Jun 1, 2023
- IEEE Transactions on Circuits and Systems for Video Technology
- Shaohui Li + 5 more
Progressive coding is essential to the practical deployment of learned image compression over heterogeneous networks and clients. Existing methods for learned progressive image compression require complex and empirical design to achieve near-optimal rate-distortion performance over a wide range of bit-rates. However, these methods are limited by the implicit learned mechanism based on neural networks and introduction of uniform quantizers. In this paper, we propose generalized learned progressive image compression with analytic rate-distortion optimization using dead-zone quantizers on the latent representation. Specifically, we reveal that dead-zone quantizers, as a general case of uniform quantizers, are equivalent to uniform quantizers in fixed-rate nonlinear transform coding and can prevent extra redundancy in embedded quantization for progressive coding. Consequently, we propose rate-distortion optimized learned progressive coding by approximating the optimal quantizer in the source spaces using dead-zone quantizers in an analytic manner on the Laplacian source. To our best knowledge, this paper is the first to achieve general learned progressive coding from the perspective of optimal quantizers. The proposed method achieves theoretically sound and practically efficient embedded quantization and learned progressive coding of latent representations with improved rate-distortion performance. It can also enable embedded quantization with diverse assignments of truncation points and support flexible configuration of quality layers of varying numbers and at varying target bit-rates. Furthermore, we successfully incorporate the proposed method into existing pre-trained fixed-rate models to realize progressive learned image compression without re-training. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performance in learned progressive image compression compared with traditional codecs and recent learned methods.
- Research Article
4
- 10.3390/electronics12102289
- May 18, 2023
- Electronics
- Hu Shao + 5 more
Deep learning-based image compression techniques can take advantage of the autoencoder’s benefits to achieve greater compression quality at the same bit rate as traditional image compression, which is more in line with user desires. Designing a high-performance processor that can increase the inference speed and efficiency of the deep learning image compression (DIC) network is important to make this technology more extensively employed in mobile devices. To the best of our knowledge, there is no dedicated processor that can accelerate DIC with low power consumption, and general-purpose network accelerators based on field programmable gate arrays (FPGA) cannot directly process compressed networks, so we propose a processor suitable for DIC in this paper. First, we analyze the image compression algorithm and quantize the data of the network into 16-bit fixed points using a dynamic hierarchical quantization. Then, we design an operation module, which is the core computational part for processing. It is composed of convolution, sampling, and normalization units, which pipeline the inference calculation for each layer of the network. To achieve high-throughput inference computing, the processing elements group (PEG) array with local buffers is developed for convolutional computation. Based on the common components in encoding and decoding, the sampling and normalization units are compatible with codec computation and utilized for image compression with time-sharing multiplexing. According to the control signal, the operation module could change the order of data flow through the three units so that they perform encoding and decoding operations, respectively. Based on these design methods and schemes, DIC is deployed into the Xilinx Zynq ZCU104 development board to achieve high-throughput image compression at 6 different bit rates. The experimental results show that the processor can run at 200 MHz and achieve 283.4 GOPS for the 16 bits fixed-point DIC network.
- Research Article
20
- 10.3390/rs15082211
- Apr 21, 2023
- Remote Sensing
- Chuan Fu + 1 more
Learned image compression has achieved a series of breakthroughs for nature images, but there is little literature focusing on high-resolution remote sensing image (HRRSI) datasets. This paper focuses on designing a learned lossy image compression framework for compressing HRRSIs. Considering the local and non-local redundancy contained in HRRSI, a mixed hyperprior network is designed to explore both the local and non-local redundancy in order to improve the accuracy of entropy estimation. In detail, a transformer-based hyperprior and a CNN-based hyperprior are fused for entropy estimation. Furthermore, to reduce the mismatch between training and testing, a three-stage training strategy is introduced to refine the network. In this training strategy, the entire network is first trained, and then some sub-networks are fixed while the others are trained. To evaluate the effectiveness of the proposed compression algorithm, the experiments are conducted on an HRRSI dataset. The results show that the proposed algorithm achieves comparable or better compression performance than some traditional and learned image compression algorithms, such as Joint Photographic Experts Group (JPEG) and JPEG2000. At a similar or lower bitrate, the proposed algorithm is about 2 dB higher than the PSNR value of JPEG2000.
- Research Article
22
- 10.1109/tcsvt.2022.3216713
- Apr 1, 2023
- IEEE Transactions on Circuits and Systems for Video Technology
- Youneng Bao + 5 more
Recently, remarkable progress has been made in learned image compression (LIC), in which nonlinear transforms (NTs) play a crucial role. Although there are many NT methods for improving the rate distortion performance, all the existing methods sacrifice the computational complexity and the number of parameters of the transformation. This paper provides a fundamental novel viewpoint on nonlinear transforms from a communication perspective, and shows how this idea can be extended to design efficient NT methods. In particular, the nonlinear transforms are inferred as signal modulation modules. Under this extrapolation, the current NTs are generalized as amplitude modulation that only varies the amplitude of the carrier wave. Therefore, a nonlinear modulation-like transform (NMLT) which varies the phase angle of the carrier is proposed. Moreover, this concept is extended by introducing In-phase/Quadrature (IQ) modulation, which is a boosting technique in communication field, in order to enhance NMLT. Furthermore, the Bit-interleaved technique in communication is used to guide the optimization of NTML with IQ. The experimental results on different datasets and backbone architectures verify the efficiency and robustness of the proposed methods. For example, when backbone architecture is hyperprior model, our method achieves 19.37% BD-rate reduction over GDN on the Kodak dataset. In addition, our method with channel wise autoregressive model leads to the state-of-the-art rate-distortion performance.
- Research Article
3
- 10.1007/s11042-023-14975-0
- Mar 7, 2023
- Multimedia Tools and Applications
- Roohan Aziz + 2 more
Block based learned image compression
- Research Article
33
- 10.1109/tnnls.2021.3104974
- Mar 1, 2023
- IEEE Transactions on Neural Networks and Learning Systems
- Mu Li + 5 more
The entropy of the codes usually serves as the rate loss in the recent learned lossy image compression methods. Precise estimation of the probabilistic distribution of the codes plays a vital role in reducing the entropy and boosting the joint rate-distortion performance. However, existing deep learning based entropy models generally assume the latent codes are statistically independent or depend on some side information or local context, which fails to take the global similarity within the context into account and thus hinders the accurate entropy estimation. To address this issue, we propose a special nonlocal operation for context modeling by employing the global similarity within the context. Specifically, due to the constraint of context, nonlocal operation is incalculable in context modeling. We exploit the relationship between the code maps produced by deep neural networks and introduce the proxy similarity functions as a workaround. Then, we combine the local and the global context via a nonlocal attention block and employ it in masked convolutional networks for entropy modeling. Taking the consideration that the width of the transforms is essential in training low distortion models, we finally produce a U-net block in the transforms to increase the width with manageable memory consumption and time complexity. Experiments on Kodak and Tecnick datasets demonstrate the priority of the proposed context-based nonlocal attention block in entropy modeling and the U-net block in low distortion situations. On the whole, our model performs favorably against the existing image compression standards and recent deep image compression models.
- Research Article
- 10.1117/1.jei.32.1.013003
- Jan 11, 2023
- Journal of Electronic Imaging
- Yuan Shi + 2 more
The conventional image compression framework is pixel fidelity-driven, which can generate compressed images with considerable visual quality even at low bit rates. However, these methods emphasize the human visual experience and ignore the need for machine recognition-driven tasks. To this end, we propose an image compression framework that utilizes multiscale prior information extracted from the machine perceptual model to improve the machine recognition accuracy of compressed images. Specifically, the interaction refinement module (IRM) is designed to interact multiscale prior information with each other, adaptively retaining machine recognition–relevant features to enhance its expression on compact features. To further improve the accuracy of machine recognition, machine vision perceptual loss is designed on semantic variation weight, which is the weight of semantic variation degree of deep adjacent layers in multiscale priors. Machine vision perceptual loss is used to optimize the semantic distortion of compressed images for retaining important semantic information. Experimental results show that compared with compression methods including BPG, WebP, Mentzer, NIC, IUWD, and RCIS, the Top-1 recognition accuracy of the proposed method is improved by 10.9%, 19%, 11.6%, 12.9%, 6%, and 2.7% at a lower bit rate (0.2 bpp). In addition, the performance improvement on other machine recognition networks and machine vision tasks shows the versatility of the proposed method.
- Research Article
77
- 10.1109/tip.2023.3263099
- Jan 1, 2023
- IEEE Transactions on Image Processing
- Haisheng Fu + 9 more
Recently deep learning-based image compression methods have achieved significant achievements and gradually outperformed traditional approaches including the latest standard Versatile Video Coding (VVC) in both PSNR and MS-SSIM metrics. Two key components of learned image compression are the entropy model of the latent representations and the encoding/decoding network architectures. Various models have been proposed, such as autoregressive, softmax, logistic mixture, Gaussian mixture, and Laplacian. Existing schemes only use one of these models. However, due to the vast diversity of images, it is not optimal to use one model for all images, even different regions within one image. In this paper, we propose a more flexible discretized Gaussian-Laplacian-Logistic mixture model (GLLMM) for the latent representations, which can adapt to different contents in different images and different regions of one image more accurately and efficiently, given the same complexity. Besides, in the encoding/decoding network design part, we propose a concatenated residual blocks (CRB), where multiple residual blocks are serially connected with additional shortcut connections. The CRB can improve the learning ability of the network, which can further improve the compression performance. Experimental results using the Kodak, Tecnick-100 and Tecnick-40 datasets show that the proposed scheme outperforms all the leading learning-based methods and existing compression standards including VVC intra coding (4:4:4 and 4:2:0) in terms of the PSNR and MS-SSIM. The source code is available at https://github.com/fengyurenpingsheng.
- Research Article
39
- 10.1109/tmm.2021.3130754
- Jan 1, 2023
- IEEE Transactions on Multimedia
- Changsheng Gao + 3 more
Instead of being observed by human, multimedia data are now more and more fed into machines to perform different kinds of semantic analysis. One image may be analyzed multiple times by different machine vision algorithms for different purposes. While machine vision-oriented image compression has been studied, the existing methods are usually driven by a specific machine vision task, and may not be applicable for other tasks. We address the task-generic image compression, in the hope that an image is compressed once but used multiple times for different tasks, all with satisfactory performance. Our study is based on the end-to-end learned image compression. We focus ourselves on the distortion metric, i.e., finding out a task-agnostic metric to estimate the quality of reconstructed images. On the one hand, we study deep feature distance as the metric, which transforms images into a latent space by a pretrained convolutional network -- the latent space is believed to be more aligned to semantics -- and calculates distance in the latent space. On the other hand, inspired by the saliency mechanism, we study an importance-weighted pixel distance as the metric, where the weights are generated to reflect the importance of the pixels to semantics. Moreover, we combine the two distances into one metric to investigate their complementary nature. An extensive set of experiments are performed to evaluate these metrics. Experimental results show that the combined metric performs the best, and leads to 20.79%~42.69% bits saving under the same semantic analysis performance, compared to the same network but optimized for signal fidelity. Interestingly, we observe that using the combined metric also improves the visual quality of the reconstructed images.