Learning Image and Video Compression Through Spatial-Temporal Energy Compaction
Compression has been an important research topic for many decades, to produce a significant impact on data transmission and storage. Recent advances have shown a great potential of learning image and video compression. Inspired from related works, in this paper, we present an image compression architecture using a convolutional autoencoder, and then generalize image compression to video compression, by adding an interpolation loop into both encoder and decoder sides. Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. Thereby, we propose to add a spatial energy compaction-based penalty into loss function, to achieve higher image compression performance. Furthermore, based on temporal energy distribution, we propose to select the number of frames in one interpolation loop, adapting to the motion characteristics of video contents. Experimental results demonstrate that our proposed image compression outperforms the latest image compression standard with MS-SSIM quality metric, and provides higher performance compared with state-of-the-art learning compression methods at high bit rates, which benefits from our spatial energy compaction approach. Meanwhile, our proposed video compression approach with temporal energy compaction can significantly outperform MPEG-4 and is competitive with commonly used H.264. Both our image and video compression can produce more visually pleasant results than traditional standards.
- Research Article
2
- 10.33103/uot.ijccce.23.1.11
- Mar 30, 2023
- Iraqi Journal of Computer, Communication, Control and System Engineering
Compression of images process is a step in the image processing. It is concerned with the transmission and storage of digitally created images. Fractal coding is a potential image and video compression approach with good reconstruction fidelity and relatively large compression ratios, because of its simplicity and great performance; fractal image compression (FIC) is a particularly popular approach in image compression applications. However, it has a significant disadvantage in the form of a long encoding time. This is because encoding any small bit necessitates a massive similarity search in the original data As a result; the FIC search time is reduced while the quality of the reconstructed images is maintained acceptable level in many introduced paper and other still a study topic in progress. Fractal images are images that are self-similar in that each individual part is the same as the total. This paper will discusses many attempts for more author that working on image and video compression using fractal compression technique based on various approach and with each discuss focuses on the main parameter of compression such compression ratio (CR), peak signal to noise ratio (PSNR) and encoding time, as well as the details of data set the used for testing also writing with each technique to creating fractal video and image compression. Index Terms— FIC, Video compression, parallel processing, Iteration Function System (IFS), Image processing.
- Conference Article
4
- 10.1109/vcip49819.2020.9301753
- Dec 1, 2020
The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.
- Book Chapter
- 10.5772/9301
- Mar 1, 2010
Image and video compression schemes are implemented for the optimum reconstruction of image with respect to speed and quality. LSCIC (Layered Scalable Concurrent Image Compression) pre coder is introduced here to utilize best available resources to obtain reasonable good image or video even at low band width of the system. This pre coder will make the layers of input data whether video or image and after synchronization send it to the output of pre coder on two different layers at the same time. Prior to understand image compression issue it is more important to become familiar with different image standard formats under usage for certain application. Mainly they include JPEG, GIF, and TIFF etc. Image compression scenario is the main entity to be included in the dissertation as per our project requirement. A new idea for scalable concurrent image compression is introduced which gives superior image reconstruction performance as compare to existing techniques. The verification can be done by calculating gray level and PSNR of reconstructed image. The bit stream is required to be compressed for image data transfer if the main system requirement is the memory saving and fast transformation with little sacrifice in the quality of image for lossy compression scheme. A valuable study is accomplished by K Shen, 1997 for parallel implementation of image and video compression. It is suggested that an ideal algorithm should have a low compressed data rate, high visual quality of the decoded image/video and low computational complexity. In hardware approaches special parallel architectures can be design to accelerate computation suggested by R. J. Gove(1994) and Shinji Komori (1988) et al. Parallel video compression algorithms can be implemented using either hardware or software approaches as proved by V. Bhaskaran (1995). These techniques provided the guidelines to deal with digital image compression schemes fro speed and complexity point of view. For video compression, motion estimation phenomenan has its own importance and different techniques are already presented to have motion estimation to get good quality image. Decoding is considered as first step of compression followed by encoding at receiving end of image and reconstruction side. Intermediate step in data/image and video compression is the transform. Different transform techniques have been used depending upon application. 20
- Dissertation
- 10.33915/etd.13084
- Jan 1, 2025
The rapid advancement of information technology and the exponential growth of digital communication have significantly increased the demand for efficient data compression techniques that reduce storage requirements, minimize bandwidth consumption, and accelerate data transmission—without substantially compromising data quality. This dissertation addresses these challenges by investigating and developing advanced learned image compression (LIC) methods, with a particular focus on lossy compression for both natural images and scientific imagery obtained from NASA’s Solar Dynamics Observatory (SDO) mission. Traditional image compression standards—such as JPEG, JPEG2000, BPG, and HEVC—rely on manually engineered transforms and heuristic rules, which often lack the adaptability required to accommodate diverse visual content and application-specific constraints. In contrast, learned image compression employs deep neural networks trained in an end-to-end manner, guided by principles from rate–distortion theory, to optimize the trade-off between compression efficiency and reconstruction fidelity. In the first part of this dissertation, several technical challenges in developing neural image compression codecs for natural images (general-purpose) are addressed, including the design of expressive nonlinear transforms, accurate entropy modeling, and the integration of perceptually meaningful loss functions. To this end, several learned image compression frameworks are proposed, each introducing distinct design innovations: a Transformer-based nonlinear transform that captures both local and global dependencies, an advanced entropy model that improves probability estimation and coding efficiency, and a conditional diffusion-based generative framework that enhances the perceptual quality of reconstructed images. The second part focuses on the application of learned compression to imagery from NASA’s Solar Dynamics Observatory (SDO) mission. A learned video compression framework is developed to exploit both spatial and temporal redundancies in solar image sequences. Furthermore, an adaptive compression strategy is introduced to prioritize scientific relevance: images containing solar flare events are compressed at lower ratios to preserve critical information, whereas non-flare images are compressed more aggressively to maximize storage and transmission efficiency. Collectively, these contributions advance the field of learned image compression across both general-purpose and scientific imaging domains, providing practical solutions for improving data transmission and storage efficiency in real-world and mission-critical environments.
- Book Chapter
1
- 10.1007/978-3-031-31417-9_33
- Jan 1, 2023
More than 80 percent of online traffic is video and image traffic and this will likely rise in the upcoming years. Images and video have multiple dimensions to grow data rate via increasing frame resolution, frame depth, multi-view representation etc. Thus it is very crucial to compress these images and videos efficiently. Lack of sufficient experimental data is a major setback for the development of image and video compression based on deep learning models. This study presents a new kind of data set for the research community with the goal of advancing the state-of-the-art in image compression using deep learning models. The proposed data set consists of the image and its corresponding VVC (Versatile Video Coding) standard based compressed image as a label of the input image for two quantization parameters. Images from different states of Indian subcontinent area has been captured, containing common objects in their natural context, the beautiful campus of Indian Institute of Technology Madras, which is blessed with rich flora and fauna, and is home to several rare wildlife species, scenes from Himalayas, Clouds in Cherrapunji, Indoor scenes etc. has been captured. The data set will be made publicly to the research community. Statistical analysis of the data set is presented along with VVC compression standard coding analysis.
- Conference Article
3
- 10.1109/icip40778.2020.9190974
- Oct 1, 2020
With recent advances in learned entropy and context models, the rate-distortion performance of deep learned image compression methods reached or surpassed those of conventional codecs. However, learned image compression is currently more complex and slower than conventional image compression. Learned image and video compression methods almost exclusively employ the generalized divisive normalization (GDN) activation function. This paper investigates the effect of activation function on the performance of image compression in terms of both objective and subjective criteria as well as runtime. In particular, we show that the distribution of latents produced by hard shrinkage fits a Laplacian better, and it is possible to achieve similar rate-distortion and better visual performance using hard shrinkage with lower complexity.
- Supplementary Content
- 10.6092/polito/porto/2671060
- Jan 1, 2017
- Politecnico di Torino
The main contribution of this thesis is the introduction of new methods for designing adaptive transforms for image and video compression. Exploiting graph signal processing techniques, we develop new graph construction methods targeted for image and video compression applications. In this way, we obtain a graph that is, at the same time, a good representation of the image and easy to transmit to the decoder. To do so, we investigate different research directions. First, we propose a new method for graph construction that employs innovative edge metrics, quantization and edge prediction techniques. Then, we propose to use a graph learning approach and we introduce a new graph learning algorithm targeted for image compression that defines the connectivities between pixels by taking into consideration the coding of the image signal and the graph topology in rate-distortion term. Moreover, we also present a new superpixel-driven graph transform that uses clusters of superpixel as coding blocks and then computes the graph transform inside each region. In the second part of this work, we exploit graphs to design directional transforms. In fact, an efficient representation of the image directional information is extremely important in order to obtain high performance image and video coding. In this thesis, we present a new directional transform, called Steerable Discrete Cosine Transform (SDCT). This new transform can be obtained by steering the 2D-DCT basis in any chosen direction. Moreover, we can also use more complex steering patterns than a single pure rotation. In order to show the advantages of the SDCT, we present a few image and video compression methods based on this new directional transform. The obtained results show that the SDCT can be efficiently applied to image and video compression and it outperforms the classical DCT and other directional transforms. Along the same lines, we present also a new generalization of the DFT, called Steerable DFT (SDFT). Differently from the SDCT, the SDFT can be defined in one or two dimensions. The 1D-SDFT represents a rotation in the complex plane, instead the 2D-SDFT performs a rotation in the 2D Euclidean space.
- Research Article
199
- 10.1109/tcsvt.2007.903663
- Oct 1, 2007
- IEEE Transactions on Circuits and Systems for Video Technology
In this paper, image compression utilizing visual redundancy is investigated. Inspired by recent advancements in image inpainting techniques, we propose an image compression framework towards visual quality rather than pixel-wise fidelity. In this framework, an original image is analyzed at the encoder side so that portions of the image are intentionally and automatically skipped. Instead, some information is extracted from these skipped regions and delivered to the decoder as assistant information in the compressed fashion. The delivered assistant information plays a key role in the proposed framework because it guides image inpainting to accurately restore these regions at the decoder side. Moreover, to fully take advantage of the assistant information, a compression-oriented edge-based inpainting algorithm is proposed for image restoration, integrating pixel-wise structure propagation and patch-wise texture synthesis. We also construct a practical system to verify the effectiveness of the compression approach in which edge map serves as assistant information and the edge extraction and region removal approaches are developed accordingly. Evaluations have been made in comparison with baseline JPEG and standard MPEG-4 AVC/H.264 intra-picture coding. Experimental results show that our system achieves up to 44% and 33% bits-savings, respectively, at similar visual quality levels. Our proposed framework is a promising exploration towards future image and video compression.
- Research Article
16
- 10.1109/tcsvt.2022.3229701
- Jun 1, 2023
- IEEE Transactions on Circuits and Systems for Video Technology
Progressive coding is essential to the practical deployment of learned image compression over heterogeneous networks and clients. Existing methods for learned progressive image compression require complex and empirical design to achieve near-optimal rate-distortion performance over a wide range of bit-rates. However, these methods are limited by the implicit learned mechanism based on neural networks and introduction of uniform quantizers. In this paper, we propose generalized learned progressive image compression with analytic rate-distortion optimization using dead-zone quantizers on the latent representation. Specifically, we reveal that dead-zone quantizers, as a general case of uniform quantizers, are equivalent to uniform quantizers in fixed-rate nonlinear transform coding and can prevent extra redundancy in embedded quantization for progressive coding. Consequently, we propose rate-distortion optimized learned progressive coding by approximating the optimal quantizer in the source spaces using dead-zone quantizers in an analytic manner on the Laplacian source. To our best knowledge, this paper is the first to achieve general learned progressive coding from the perspective of optimal quantizers. The proposed method achieves theoretically sound and practically efficient embedded quantization and learned progressive coding of latent representations with improved rate-distortion performance. It can also enable embedded quantization with diverse assignments of truncation points and support flexible configuration of quality layers of varying numbers and at varying target bit-rates. Furthermore, we successfully incorporate the proposed method into existing pre-trained fixed-rate models to realize progressive learned image compression without re-training. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performance in learned progressive image compression compared with traditional codecs and recent learned methods.
- Research Article
77
- 10.1016/j.neucom.2016.06.050
- Jun 22, 2016
- Neurocomputing
Lossless image compression based on integer Discrete Tchebichef Transform
- Research Article
4
- 10.1142/s0129156497000056
- Mar 1, 1997
- International Journal of High Speed Electronics and Systems
The area of image and video compression has made tremendous progress over the last several decades. The successes in image compression are due to advances and better understanding of waveform coding methods which take advantage of the signal statistics, perceptual methods which take advantage of psychovisual properties of the human visual system (HVS) and object-based models especially for very low bit rate work. Recent years have produced several image coding standards—JPEG for still image compression and H.261, MPEG-I and MPEG-II for video compression. While we have devoted a special section in this paper to cover international coding standards because of their practical value, we have also covered a large class of nonstandard coding technology in the interest of completeness and potential future value. Very low bit rate video coding remains a challenging problem as does our understanding of the human visual system for perceptually optimum compression. The wide range of applications and bit rates, from video telephony at rates as low as 9.6 kbps to HDTV at 20 Mbps and higher, has acted as a catalyst for generating new ideas in tackling the different challenges characterized by the particular application. The area of image compression will remain an interesting and fruitful area of research as we focus on combining source coding with channel coding and multimedia networking.
- Research Article
66
- 10.1109/tcsvt.2021.3119660
- Jun 1, 2022
- IEEE Transactions on Circuits and Systems for Video Technology
Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications. First, parallel acceleration of the autoregressive entropy model cannot be achieved due to serial decoding. Second, full-resolution inference often causes the out-of-memory (OOM) problem with limited GPU resources, especially for high-resolution images. Block partition is a good choice to handle the above issues, but it brings about new challenges in reducing the redundancy between blocks and eliminating block effects. To tackle the above challenges, this paper provides a learned block-based hybrid image compression (LBHIC) framework. Specifically, we introduce explicit intra prediction into a learned image compression framework to utilize the relation among adjacent blocks. Superior to context modeling by linear weighting of neighbor pixels in traditional codecs, we propose a contextual prediction module (CPM) to better capture long-range correlations by utilizing the strip pooling to extract the most relevant information in neighboring latent space, thus achieving effective information prediction. Moreover, to alleviate blocking artifacts, we further propose a boundary-aware postprocessing module (BPM) with the edge importance taken into account. Extensive experiments demonstrate that the proposed LBHIC codec outperforms the VVC, with a bit-rate conservation of 4.1%, and reduces the decoding time by approximately 86.7% compared with that of state-of-the-art learned image compression methods.
- Research Article
102
- 10.1109/tmm.2019.2938345
- Sep 5, 2019
- IEEE Transactions on Multimedia
Image compression has been an important research topic for many decades. Recently, deep learning has achieved great success in many computer vision tasks, and its use in image compression has gradually been increasing. In this paper, we present an energy compaction-based image compression architecture using a convolutional autoencoder (CAE) to achieve high coding efficiency. Our main contributions include three aspects: 1) we propose a CAE architecture for image compression by decomposing it into several down(up)sampling operations; 2) for our CAE architecture, we offer a mathematical analysis on the energy compaction property and we are the first work to propose a normalized coding gain metric in neural networks, which can act as a measurement of compression capability; 3) based on the coding gain metric, we propose an energy compaction-based bit allocation method, which adds a regularizer to the loss function during the training stage to help the CAE maximize the coding gain and achieve high compression efficiency. The experimental results demonstrate our proposed method outperforms BPG (HEVC-intra), in terms of the MS-SSIM quality metric. Additionally, we achieve better performance in comparison with existing bit allocation methods, and provide higher coding efficiency compared with state-of-the-art learning compression methods at high bit rates.
- Book Chapter
36
- 10.1007/978-3-030-69538-5_36
- Jan 1, 2021
Recent advances in deep generative modeling have enabled efficient modeling of high dimensional data distributions and opened up a new horizon for solving data compression problems. Specifically, autoencoder based learned image or video compression solutions are emerging as strong competitors to traditional approaches. In this work, We propose a new network architecture, based on common and well studied components, for learned video compression operating in low latency mode. Our method yields competitive MS-SSIM/rate performance on the high-resolution UVG dataset, among both learned video compression approaches and classical video compression methods (H.265 and H.264) in the rate range of interest for streaming applications. Additionally, we provide an analysis of existing approaches through the lens of their underlying probabilistic graphical models. Finally, we point out issues with temporal consistency and color shift observed in empirical evaluation, and suggest directions forward to alleviate those.
- Research Article
- 10.1109/tcsvt.2024.3522621
- May 1, 2025
- IEEE Transactions on Circuits and Systems for Video Technology
In the field of autonomous driving, a variety of sensor data types exist, each representing different modalities of the same scene. Therefore, it is feasible to utilize data from other sensors to facilitate image compression. However, few techniques have explored the potential benefits of utilizing inter-modality correlations to enhance the image compression performance. In this paper, motivated by the recent success of learned image compression, we propose a new framework that uses sparse point clouds to assist in learned image compression in the autonomous driving scenario. We first project the 3D sparse point cloud onto a 2D plane, resulting in a sparse depth map. Utilizing this depth map, we proceed to predict camera images. Subsequently, we use these predicted images to extract multi-scale structural features. These features are then incorporated into learned image compression pipeline as additional information to improve the compression performance. Our proposed framework is compatible with various mainstream learned image compression models, and we validate our approach using different existing image compression methods. The experimental results show that incorporating point cloud assistance into the compression pipeline consistently enhances the performance.