Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Deep Image Compression Toward Machine Vision: A Unified Optimization Framework

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

There has been an increasing consensus that the machine vision is gradually replacing human vision in numerous tasks, with the demonstrated success of artificial intelligence. In this paper, we propose a deep image compression scheme towards machine vision, with the principle of “begin with the end in mind”. In particular, a unified optimization scheme for end-to-end image compression towards machine vision is proposed, accompanied with the dedicated variable bitrate coding and generalized rate-accuracy optimization. The presented framework, which jointly optimizes the compression and the machine vision networks, exploits the utmost potential of robust machine vision for compressed images. The variable bitrate modules towards machine vision, which effectively shrink the storage space for model parameters, are further developed to accommodate to the real-world applications. Moreover, an iterative algorithm is presented to achieve the optimality in terms of the generalized rate-accuracy towards machine vision. Experimental results show that the proposed framework achieves the state-of-the-art object detection performance among the end-to-end image compression methods: in the exploration of Video Coding for Machines (VCM) in Moving Picture Experts Group (MPEG), and the proposed framework achieves 31.69% and 23.96% BD-rate gains compared with the VCM official test datasets, the Open Images dataset and the TVD dataset respectively, which are generated using the state-of-the-art standard Versatile Video Coding (VVC) standard. The generalization capability of the proposed framework is also verified with instance segmentation under various scenarios.

Similar Papers
  • Research Article
  • Cite Count Icon 3
  • 10.1109/access.2023.3263207
Versatile Video Coding-Based Coding Tree Unit Level Image Compression With Dual Quantization Parameters for Hybrid Vision
  • Jan 1, 2023
  • IEEE Access
  • Shin Kim + 2 more

Image analysis based on machine vision is hugely manipulated in the smart industry. Good-quality images are required for outstanding machine analysis results, but handling high-definition images could be problematic in a constrained environment such as a low-bandwidth network or low-capacity storage. Lowering the image resolution might be a straightforward solution for reducing image data, but it would cause much information loss, leading to the deterioration of machine vision. Moreover, human supervision could be necessary for a contingency that machine vision cannot control. Therefore, an innovative image compression method considering machine and human vision is required; more compression efficiency than the state-of-the-art codec, praiseworthy machine vision performance, and human-recognizable quality. In this paper, we propose Versatile video coding(VVC) based image compression for hybrid vision, i.e., machine vision and human vision. Our work provides a coding tree unit(CTU) level image compression with dual quantization parameters (QPs) according to the quantization parameter map and the saliency extracted by the object detection network; in the salient region, the proposed method maintains high quality with low QP but degrades the quality with high QP in the non-salient region. Compared with VVC, the proposed compression method achieves a bitrate reduction of up to 25.5% in machine vision tasks, proving more compression efficiency and still admirable machine vision performance. From the perspective of human vision, the proposed method provides human-perceptible image quality, preserving acceptable objective quality values.

  • Research Article
  • Cite Count Icon 24
  • 10.1109/tip.2023.3251020
CBANet: Toward Complexity and Bitrate Adaptive Deep Image Compression Using a Single Network.
  • Jan 1, 2023
  • IEEE Transactions on Image Processing
  • Jinyang Guo + 2 more

In this work, we propose a new deep image compression framework called Complexity and Bitrate Adaptive Network (CBANet) that aims to learn one single network to support variable bitrate coding under various computational complexity levels. In contrast to the existing state-of-the-art learning-based image compression frameworks that only consider the rate-distortion trade-off without introducing any constraint related to the computational complexity, our CBANet considers the complex rate-distortion-complexity trade-off when learning a single network to support multiple computational complexity levels and variable bitrates. Since it is a non-trivial task to solve such a rate-distortion-complexity related optimization problem, we propose a two-step approach to decouple this complex optimization task into a complexity-distortion optimization sub-task and a rate-distortion optimization sub-task, and additionally propose a new network design strategy by introducing a Complexity Adaptive Module (CAM) and a Bitrate Adaptive Module (BAM) to respectively achieve the complexity-distortion and rate-distortion trade-offs. As a general approach, our network design strategy can be readily incorporated into different deep image compression methods to achieve complexity and bitrate adaptive image compression by using a single network. Comprehensive experiments on two benchmark datasets demonstrate the effectiveness of our CBANet for deep image compression. Code is released at https://github.com/JinyangGuo/CBANet-release.

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.jvcir.2021.103226
Deep image compression with multi-stage representation
  • Jul 21, 2021
  • Journal of Visual Communication and Image Representation
  • Zixi Wang + 3 more

Deep image compression with multi-stage representation

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1592
  • 10.1109/tcsvt.2021.3101953
Overview of the Versatile Video Coding (VVC) Standard and its Applications
  • Oct 1, 2021
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Benjamin Bross + 6 more

Versatile Video Coding (VVC) was finalized in July 2020 as the most recent international video coding standard. It was developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) to serve an ever-growing need for improved video compression as well as to support a wider variety of today’s media content and emerging applications. This paper provides an overview of the novel technical features for new applications and the core compression technologies for achieving significant bit rate reductions in the neighborhood of 50% over its predecessor for equal video quality, the High Efficiency Video Coding (HEVC) standard, and 75% over the currently most-used format, the Advanced Video Coding (AVC) standard. It is explained how these new features in VVC provide greater versatility for applications. Highlighted applications include video with resolutions beyond standard- and high-definition, video with high dynamic range and wide color gamut, adaptive streaming with resolution changes, computer-generated and screen-captured video, ultralow-delay streaming, 360° immersive video, and multilayer coding e.g., for scalability. Furthermore, early implementations are presented to show that the new VVC standard is implementable and ready for real-world deployment.

  • Research Article
  • 10.1109/tmm.2026.3651136
Learned Image Compression Via Local-to-Global Cross-Component Prior
  • Jan 1, 2026
  • IEEE Transactions on Multimedia
  • Wenhong Duan + 6 more

Learned image compression (LIC) methods have shown promising results and achieved superior performance compared to traditional image compression methods. Due to the neglect of the utilization of cross-component correlations, there is still a potential for further performance improvement. In this paper, we first explore the inter-channel correlations of different color spaces and transform the image compression problem in RGB color space into that in YUV color space, which has cross-component prior information. We propose a novel image compression method that leverages local-to-global cross-component prior modeling, utilizing a cross-component attention mechanism to improve coding performance. First, we design the cross-component prior gate (CPG) to model the cross-component prior information based on attention mechanism. Inspired by common knowledge in data compression, luma component (Y) contains more details and textural/structural information compared to chroma components (UV). The proposed method can make full use of the cross-component guidance information from luma to chroma components to achieve effective image compression. Experimental results demonstrate that the proposed method can achieve superior performance compared to existing learned image compression methods. The proposed method can achieve 9.20% rate savings compared to the image compression standard Versatile Video Coding (VVC) Test Model (VTM-11.0) on Kodak dataset.

  • Research Article
  • Cite Count Icon 68
  • 10.1109/tcsvt.2022.3199472
Joint Graph Attention and Asymmetric Convolutional Neural Network for Deep Image Compression
  • Jan 1, 2023
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Zhisen Tang + 5 more

Recent deep image compression methods have achieved prominent progress by using nonlinear modeling and powerful representation capabilities of neural networks. However, most existing learning-based image compression approaches employ customized convolutional neural network (CNN) to utilize visual features by treating all pixels equally, neglecting the effect of local key features. Meanwhile, the convolutional filters in CNN usually express the local spatial relationship within the receptive field and seldom consider the long-range dependencies from distant locations. This results in the long-range dependencies of latent representations not being fully compressed. To address these issues, an end-to-end image compression method is proposed by integrating graph attention and asymmetric convolutional neural network (ACNN). Specifically, ACNN is used to strengthen the effect of local key features and reduce the cost of model training. Graph attention is introduced into image compression to address the bottleneck problem of CNN in modeling long-range dependencies. Meanwhile, regarding the limitation that existing attention mechanisms for image compression hardly share information, we propose a self-attention approach which allows information flow to achieve reasonable bit allocation. The proposed self-attention approach is in compliance with the perceptual characteristics of human visual system, as information can interact with each other via attention modules. Moreover, the proposed self-attention approach takes into account channel-level relationship and positional information to promote the compression effect of rich-texture regions. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performances after being optimized by MS-SSIM compared to recent deep compression models on the benchmark datasets of Kodak and Tecnick. The project page with the source code can be found in <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://mic.tongji.edu.cn</uri> .

  • Research Article
  • Cite Count Icon 2
  • 10.1109/tcsvt.2025.3525664
Task–Adapted Learnable Embedded Quantization for Scalable Human-Machine Image Compression
  • May 1, 2025
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Shaohui Li + 7 more

Image compression for both human and machine vision has become prevailing to accommodate to rising demands for machine-machine and human-machine communications. Scalable human-machine image compression is recently emerging as an efficient alternative to simultaneously achieve high accuracy for machine vision in the base layer and obtain high-fidelity reconstruction for human vision in the enhancement layer. However, existing methods achieve scalable coding with heuristic mechanisms, which cannot fully exploit the inter-layer correlations and evidently sacrifice rate-distortion performance. In this paper, we propose task-adapted learnable embedded quantization to address this problem in an analytically optimized fashion. We first reveal the relationship between the latent representations for machine and human vision and demonstrate that optimal representation for machine vision can be approximated with post-training optimization on the learned representation for human vision. On such basis, we propose task-adapted learnable embedded quantization that leverages learnable step predictor to adaptively determine the optimal quantization step for diverse machine vision tasks such that inter-layer correlations between representations for human and machine vision are sufficiently exploited using embedded quantization. Furthermore, we develop a human-machine scalable coding framework by incorporating the proposed embedded quantization into pre-trained learned image compression models. Experimental results demonstrate that the proposed framework achieves state-of-the-art performance on machine vision tasks like object detection, instance segmentation, and panoptic segmentation with negligible loss in rate-distortion performance for human vision.

  • Research Article
  • Cite Count Icon 115
  • 10.1109/tcsvt.2021.3087706
Transform Coding in the VVC Standard
  • Oct 1, 2021
  • IEEE Transactions on Circuits and Systems for Video Technology
  • Xin Zhao + 7 more

In the past decade, the development of transform coding techniques has achieved significant progress and several advanced transform tools have been adopted in the new generation Versatile Video Coding (VVC) standard. In this paper, a brief history of transform coding development during VVC standardization is presented, and the transform coding tools in the VVC standard are described in detail together with their initial design, incremental improvements and implementation aspects. To improve coding efficiency, four new transform coding techniques are introduced in VVC, which are namely Multiple Transform Selection (MTS), Low-Frequency Non-separable Secondary Transform (LFNST) and Sub-Block Transform (SBT), as well as a large (64-point) type-2 DCT. The experimental results on VVC reference software (VTM-9.0) show that average 4.5&#x0025; and 3.6&#x0025; overall coding gain can be achieved by the VVC transform coding tools for All Intra and Random Access configurations, respectively.

  • Conference Article
  • Cite Count Icon 2
  • 10.1109/icon.2002.1033293
Rate renegotiation algorithm with dynamic prediction window for efficient transport of streaming VBR MPEG coded video over ATM networks
  • Nov 7, 2002
  • P Markov + 1 more

For video sources the Moving Picture Experts Group (MPEG) compression scheme has become the defacto standard for video compression since then. However, even with the huge reduction of bits that MPEG compression provides, it does not smooth the video traffic. Indeed the variable bit rate (VBR) MPEG compression algorithm guarantees that the MPEG stream will be bursty. A service, where an asynchronous transfer mode (ATM) client requests and receives from an ATM server VBR MPEG coded video sequences, is considered. An algorithm for streaming VBR MPEG coded video delivery over ATM networks, which dynamically allocates the transmission parameters, is proposed. A scheme for optimal choice of the prediction window's size is also presented. The results obtained show that the proposed dynamic allocation algorithm can provide an efficient solution for VBR MPEG coded video transport with guaranteed quality of service (QoS) over ATM networks.

  • Research Article
  • Cite Count Icon 5
  • 10.1002/int.22769
Deep image compression with lifting scheme: Wavelet transform domain based on high‐frequency subband prediction
  • Dec 8, 2021
  • International Journal of Intelligent Systems
  • M I Anju + 1 more

Image compression is the most important image processing method extensively deployed in different appliances. “Discrete wavelet transform (DWT)” is one of the well-adopted transforming methods exploited for compressing images. The extremely deployed version of DWT is convolution-oriented. Nevertheless, the lifting-oriented DWT scheme requires more contemplation on more proficient performance and lesser computation cost. This paper intends to propose a deep learning-based image compression model with a lifting scheme for predicting high-frequency subbands. Moreover, the fine-tuning in lifting factorization is done by a new Sea Lion with Averaged Update Evaluation that includes new cosine estimation under the COordinate Rotation DIgital Computer algorithm. Similarly, this study defines a new single objective function that merges the multiconstraints, like, “Peak Signal to Noise Ratio (PSNR) and Compression Ratio (CR)”. At last, the supremacy of the presented approach is proved with respect to varied measures, like, CR, PSNR and so on.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.3390/electronics12194042
Transform-Based Feature Map Compression Method for Video Coding for Machines (VCM)
  • Sep 26, 2023
  • Electronics
  • Minhun Lee + 6 more

The burgeoning field of machine vision has led to the development by the Moving Picture Experts Group (MPEG) of a new type of compression technology called video coding for machines (VCM), to enhance machine recognition through video information compression. This research proposes a principal component analysis (PCA)-based compression methodology for multi-level feature maps extracted from the feature pyramid network (FPN) structure. Unlike current PCA-based studies that independently carry out PCA for each feature map, our approach employs a generalized basis matrix and mean vector derived from channel correlations by a generalized PCA process to eliminate the need for a PCA process. Further compression is achieved by amalgamating high-dimensional feature maps, capitalizing on the spatial redundancy within these multi-level feature maps. As a result, the proposed VCM encoder forgoes the PCA process, and the generalized data do not incur any compression loss. It only requires compressing the coefficients for each feature map using versatile video coding (VVC). Experimental results demonstrate superior performance by our method over all feature anchors for each machine vision task, as specified by the MPEG-VCM common test conditions, outperforming previous PCA-based feature map compression methods. Notably, it achieved an 89.3% BD-rate reduction for instance segmentation tasks.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/icc.1996.541258
Computation of effective bandwidth of aggregated VBR MPEG video traffic in ATM networks using the modified equivalent capacity
  • Jun 23, 1996
  • Chang Bum Lee + 2 more

A method for computing the effective bandwidth of aggregated variable bit rate (VBR) Moving Picture Experts Group (MPEG) video traffic is proposed. First, individual MPEG traffic are split into I, B, and P frame traffic according to the frame type and the respective I, B, and P frame traffic are aggregated, where transform expand sample (TES) processes are employed for modeling the MPEG traffic. Second, we compute the statistical characteristics of the aggregated I frame traffic, aggregated B frame traffic, and aggregated P frame traffic using the individual I, B, and P frame traffic, where the statistical characteristics represent the mean, second and third central moments, and the lag 1 autocorrelation of the bit rate of the traffic. Next, the effective bandwidth of the aggregated I frame traffic is computed by the Gaussian bound. We calculate the statistical characteristics of the combined B and P frame traffic using those of the aggregated B frame and P frame traffic, and estimate the effective bandwidth of the combined B and P frame traffic using the modified equivalent capacity. Finally, we compute the total effective bandwidth of the aggregated VBR MPEG traffic by adding the Gaussian bound of the aggregated I frame traffic and the modified equivalent capacity of the combined B and P frame traffic. Computer simulation shows that the proposed method provides a good estimate of the total effective bandwidth of the aggregated VBR MPEG traffic.

  • Conference Article
  • Cite Count Icon 19
  • 10.1109/fuzzy.2009.5277082
An intelligent video streaming technique in zigbee wireless
  • Aug 1, 2009
  • H B Kazemian

This paper is concerned with an intelligent application of Moving Picture Expert Group (MPEG) video transmission over IEEE 802.15.4 – ZigBee. MPEG Variable Bit Rate (VBR) video is data hungry and presents excessive time delay and data loss over a wireless communication. Conventional rate policing such as generic cell rate algorithm is inadequate to sufficiently regulate transmission of VBR data sources over bandwidth limited ZigBee. Therefore, it is impossible to transmit MPEG VBR video over ZigBee channel. A buffer entitled ‘traffic-shaping buffer’ is introduced to prevent excessive overflow of MPEG video data over the ZigBee channel. A new Neural-Fuzzy (NF) scheme is developed to adjust the traffic-shaping buffer output rate to eliminate unacceptable delay or loss of the VBR encoded video and to conform the data to the token-bucket's contract prior entering the ZigBee channel. A Rule-Based Fuzzy (RBF) scheme is developed to monitor the data rate entering the traffic-shaper, in order to prevent either saturation or starvation of the buffer. The simulation results show that the use of the NF scheme and the RBF scheme enables MPEG VBR video to be transmitted over ZigBee.

  • Conference Article
  • Cite Count Icon 25
  • 10.1109/icip.2018.8451411
Deep Image Compression with Iterative Non-Uniform Quantization
  • Oct 1, 2018
  • Jianrui Cai + 1 more

Image compression, which aims to represent an image with less storage space, is a classical problem in image processing. Recently, by training an encoder-quantizer-decoder network, deep convolutional neural networks (CNNs) have achieved promising results in image compression. As a nondifferentiable part of the compression system, quantizer is hard to be updated during the network training. Most of existing deep image compression methods adopt a uniform rounding function as the quantizer, which however restricts the capability and flexibility of CNNs in compressing complex image structures. In this paper, we present an iterative nonuniform quantization scheme for deep image compression. More specifically, we alternatively optimize the quantizer and encoder-decoder. When the encoder-decoder is fixed, a non-uniform quantizer is optimized based on the distribution of representation features. The encoder-decoder network is then updated by fixing the quantizer. Extensive experiments demonstrate the superior PSNR index of the proposed method to existing deep compressors and JPEG2000.

  • Research Article
  • Cite Count Icon 4
  • 10.1109/access.2023.3260223
A Super-Resolution-Based Feature Map Compression for Machine-Oriented Video Coding
  • Jan 1, 2023
  • IEEE Access
  • Jung-Heum Kang + 7 more

Recently, video and image compression methods using neural networks have received much attention. In MPEG standardization, Video Coding for Machine (VCM) is a newly arising topic which attempts to compress features/images for the purpose of machine vision tasks. Especially, compressing features has advantages in terms of privacy protection and computation off-loading. In this paper, we propose an effective feature compression method equipped with a super-resolution (SR) module for features. Our main motivation comes from the observation that features are somewhat robust to spatial distortions (e.g., AWGN, blur, quantization distortions, coding artifacts), which leads us to integrating an SR module into the compression framework. We also further explore the best training strategy of the proposed method, i.e., finding the best combination of various losses and proper input feature shapes. Our comprehensive experiments show that the proposed method outperforms the baseline in the original VCM anchor scenario on various QP values with Versatile Video Coding (VVC). Specifically, the proposed framework achieved up to 50% BD-rate reduction compared to the conventional P-layer feature map compression method for the object detection task on the OpenImage dataset.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant