Remote Sensing Image Compression Based on the Multiple Prior Information
This paper develops a learned lossy compression framework for high-resolution remote sensing images, combining transformer-based and CNN-based hyperpriors to exploit local and non-local redundancies. Experiments show the method outperforms traditional algorithms like JPEG and JPEG2000, achieving approximately 2 dB higher PSNR at similar or lower bitrates.
Learned image compression has achieved a series of breakthroughs for nature images, but there is little literature focusing on high-resolution remote sensing image (HRRSI) datasets. This paper focuses on designing a learned lossy image compression framework for compressing HRRSIs. Considering the local and non-local redundancy contained in HRRSI, a mixed hyperprior network is designed to explore both the local and non-local redundancy in order to improve the accuracy of entropy estimation. In detail, a transformer-based hyperprior and a CNN-based hyperprior are fused for entropy estimation. Furthermore, to reduce the mismatch between training and testing, a three-stage training strategy is introduced to refine the network. In this training strategy, the entire network is first trained, and then some sub-networks are fixed while the others are trained. To evaluate the effectiveness of the proposed compression algorithm, the experiments are conducted on an HRRSI dataset. The results show that the proposed algorithm achieves comparable or better compression performance than some traditional and learned image compression algorithms, such as Joint Photographic Experts Group (JPEG) and JPEG2000. At a similar or lower bitrate, the proposed algorithm is about 2 dB higher than the PSNR value of JPEG2000.
- Research Article
16
- 10.1109/tcsvt.2022.3229701
- Jun 1, 2023
- IEEE Transactions on Circuits and Systems for Video Technology
Progressive coding is essential to the practical deployment of learned image compression over heterogeneous networks and clients. Existing methods for learned progressive image compression require complex and empirical design to achieve near-optimal rate-distortion performance over a wide range of bit-rates. However, these methods are limited by the implicit learned mechanism based on neural networks and introduction of uniform quantizers. In this paper, we propose generalized learned progressive image compression with analytic rate-distortion optimization using dead-zone quantizers on the latent representation. Specifically, we reveal that dead-zone quantizers, as a general case of uniform quantizers, are equivalent to uniform quantizers in fixed-rate nonlinear transform coding and can prevent extra redundancy in embedded quantization for progressive coding. Consequently, we propose rate-distortion optimized learned progressive coding by approximating the optimal quantizer in the source spaces using dead-zone quantizers in an analytic manner on the Laplacian source. To our best knowledge, this paper is the first to achieve general learned progressive coding from the perspective of optimal quantizers. The proposed method achieves theoretically sound and practically efficient embedded quantization and learned progressive coding of latent representations with improved rate-distortion performance. It can also enable embedded quantization with diverse assignments of truncation points and support flexible configuration of quality layers of varying numbers and at varying target bit-rates. Furthermore, we successfully incorporate the proposed method into existing pre-trained fixed-rate models to realize progressive learned image compression without re-training. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performance in learned progressive image compression compared with traditional codecs and recent learned methods.
- Research Article
66
- 10.1109/tcsvt.2021.3119660
- Jun 1, 2022
- IEEE Transactions on Circuits and Systems for Video Technology
Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications. First, parallel acceleration of the autoregressive entropy model cannot be achieved due to serial decoding. Second, full-resolution inference often causes the out-of-memory (OOM) problem with limited GPU resources, especially for high-resolution images. Block partition is a good choice to handle the above issues, but it brings about new challenges in reducing the redundancy between blocks and eliminating block effects. To tackle the above challenges, this paper provides a learned block-based hybrid image compression (LBHIC) framework. Specifically, we introduce explicit intra prediction into a learned image compression framework to utilize the relation among adjacent blocks. Superior to context modeling by linear weighting of neighbor pixels in traditional codecs, we propose a contextual prediction module (CPM) to better capture long-range correlations by utilizing the strip pooling to extract the most relevant information in neighboring latent space, thus achieving effective information prediction. Moreover, to alleviate blocking artifacts, we further propose a boundary-aware postprocessing module (BPM) with the edge importance taken into account. Extensive experiments demonstrate that the proposed LBHIC codec outperforms the VVC, with a bit-rate conservation of 4.1%, and reduces the decoding time by approximately 86.7% compared with that of state-of-the-art learned image compression methods.
- Research Article
- 10.1109/tcsvt.2024.3522621
- May 1, 2025
- IEEE Transactions on Circuits and Systems for Video Technology
In the field of autonomous driving, a variety of sensor data types exist, each representing different modalities of the same scene. Therefore, it is feasible to utilize data from other sensors to facilitate image compression. However, few techniques have explored the potential benefits of utilizing inter-modality correlations to enhance the image compression performance. In this paper, motivated by the recent success of learned image compression, we propose a new framework that uses sparse point clouds to assist in learned image compression in the autonomous driving scenario. We first project the 3D sparse point cloud onto a 2D plane, resulting in a sparse depth map. Utilizing this depth map, we proceed to predict camera images. Subsequently, we use these predicted images to extract multi-scale structural features. These features are then incorporated into learned image compression pipeline as additional information to improve the compression performance. Our proposed framework is compatible with various mainstream learned image compression models, and we validate our approach using different existing image compression methods. The experimental results show that incorporating point cloud assistance into the compression pipeline consistently enhances the performance.
- Conference Article
112
- 10.1109/cvpr.2019.01031
- Jun 1, 2019
Compression has been an important research topic for many decades, to produce a significant impact on data transmission and storage. Recent advances have shown a great potential of learning image and video compression. Inspired from related works, in this paper, we present an image compression architecture using a convolutional autoencoder, and then generalize image compression to video compression, by adding an interpolation loop into both encoder and decoder sides. Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. Thereby, we propose to add a spatial energy compaction-based penalty into loss function, to achieve higher image compression performance. Furthermore, based on temporal energy distribution, we propose to select the number of frames in one interpolation loop, adapting to the motion characteristics of video contents. Experimental results demonstrate that our proposed image compression outperforms the latest image compression standard with MS-SSIM quality metric, and provides higher performance compared with state-of-the-art learning compression methods at high bit rates, which benefits from our spatial energy compaction approach. Meanwhile, our proposed video compression approach with temporal energy compaction can significantly outperform MPEG-4 and is competitive with commonly used H.264. Both our image and video compression can produce more visually pleasant results than traditional standards.
- Conference Article
3
- 10.1109/icip40778.2020.9190974
- Oct 1, 2020
With recent advances in learned entropy and context models, the rate-distortion performance of deep learned image compression methods reached or surpassed those of conventional codecs. However, learned image compression is currently more complex and slower than conventional image compression. Learned image and video compression methods almost exclusively employ the generalized divisive normalization (GDN) activation function. This paper investigates the effect of activation function on the performance of image compression in terms of both objective and subjective criteria as well as runtime. In particular, we show that the distribution of latents produced by hard shrinkage fits a Laplacian better, and it is possible to achieve similar rate-distortion and better visual performance using hard shrinkage with lower complexity.
- Research Article
9
- 10.1016/j.neucom.2022.07.065
- Jul 22, 2022
- Neurocomputing
Successive learned image compression: Comprehensive analysis of instability
- Research Article
6
- 10.3390/rs14061319
- Mar 9, 2022
- Remote Sensing
Content-based remote sensing (RS) image retrieval (CBRSIR) is a critical way to organize high-resolution RS (HRRS) images in the current big data era. The increasing volume of HRRS images from different satellites and sensors leads to more attention to the cross-source CSRSIR (CS-CBRSIR) problem. Due to the data drift, one crucial problem in CS-CBRSIR is the modality discrepancy. Most existing methods focus on finding a common feature space for various HRRS images to address this issue. In this space, their similarity relations can be measured directly to obtain the cross-source retrieval results straight. This way is feasible and reasonable, however, the specific information corresponding to HRRS images from different sources is always ignored, limiting retrieval performance. To overcome this limitation, we develop a new model for CS-CBRSIR in this paper named dual modality collaborative learning (DMCL). To fully explore the specific information from diverse HRRS images, DMCL first introduces ResNet50 as the feature extractor. Then, a common space mutual learning module is developed to map the specific features into a common space. Here, the modality discrepancy is reduced from the aspects of features and their distributions. Finally, to supplement the specific knowledge to the common features, we develop modality transformation and the dual-modality feature learning modules. Their function is to transmit the specific knowledge from different sources mutually and fuse the specific and common features adaptively. The comprehensive experiments are conducted on a public dataset. Compared with many existing methods, the behavior of our DMCL is stronger. These encouraging results for a public dataset indicate that the proposed DMCL is useful in CS-CBRSIR tasks.
- Conference Article
355
- 10.1109/cvpr52688.2022.00563
- Jun 1, 2022
Recently, learned image compression techniques have achieved remarkable performance, even surpassing the best manually designed lossy image coders. They are promising to be large-scale adopted. For the sake of practicality, a thorough investigation of the architecture design of learned image compression, regarding both compression performance and running speed, is essential. In this paper, we first propose uneven channel-conditional adaptive coding, motivated by the observation of energy compaction in learned image compression. Combining the proposed uneven grouping model with existing context models, we obtain a spatial-channel contextual adaptive model to improve the coding performance without damage to running speed. Then we study the structure of the main transform and propose an efficient model, ELIC, to achieve state-of-the-art speed and compression ability. With superior performance, the proposed model also supports extremely fast preview decoding and progressive decoding, which makes the coming application of learning-based image compression more promising.
- Dissertation
- 10.33915/etd.13084
- Jan 1, 2025
The rapid advancement of information technology and the exponential growth of digital communication have significantly increased the demand for efficient data compression techniques that reduce storage requirements, minimize bandwidth consumption, and accelerate data transmission—without substantially compromising data quality. This dissertation addresses these challenges by investigating and developing advanced learned image compression (LIC) methods, with a particular focus on lossy compression for both natural images and scientific imagery obtained from NASA’s Solar Dynamics Observatory (SDO) mission. Traditional image compression standards—such as JPEG, JPEG2000, BPG, and HEVC—rely on manually engineered transforms and heuristic rules, which often lack the adaptability required to accommodate diverse visual content and application-specific constraints. In contrast, learned image compression employs deep neural networks trained in an end-to-end manner, guided by principles from rate–distortion theory, to optimize the trade-off between compression efficiency and reconstruction fidelity. In the first part of this dissertation, several technical challenges in developing neural image compression codecs for natural images (general-purpose) are addressed, including the design of expressive nonlinear transforms, accurate entropy modeling, and the integration of perceptually meaningful loss functions. To this end, several learned image compression frameworks are proposed, each introducing distinct design innovations: a Transformer-based nonlinear transform that captures both local and global dependencies, an advanced entropy model that improves probability estimation and coding efficiency, and a conditional diffusion-based generative framework that enhances the perceptual quality of reconstructed images. The second part focuses on the application of learned compression to imagery from NASA’s Solar Dynamics Observatory (SDO) mission. A learned video compression framework is developed to exploit both spatial and temporal redundancies in solar image sequences. Furthermore, an adaptive compression strategy is introduced to prioritize scientific relevance: images containing solar flare events are compressed at lower ratios to preserve critical information, whereas non-flare images are compressed more aggressively to maximize storage and transmission efficiency. Collectively, these contributions advance the field of learned image compression across both general-purpose and scientific imaging domains, providing practical solutions for improving data transmission and storage efficiency in real-world and mission-critical environments.
- Conference Article
4
- 10.1109/vcip49819.2020.9301753
- Dec 1, 2020
The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.
- Conference Article
2
- 10.1109/cccai59026.2023.00041
- Jun 1, 2023
In recent years, there has been rapid development in learned image compression techniques that prioritize rate-distortion-perceptual compression, preserving fine details even at lower bit-rates. However, current learning-based image compression methods often sacrifice human-friendly compression and require long decoding times. In this paper, we propose enhancements to the backbone network and loss function of existing image compression model, focusing on improving human perception and efficiency. Our proposed approach achieves competitive subjective results compared to state-of-the-art end-to-end learned image compression methods and classic methods, while requiring less decoding time and offering human-friendly compression. Through empirical evaluation, we demonstrate the effectiveness of our proposed method in achieving outstanding performance, with more than 25% bit-rate saving with comparable perceptual quality.
- Research Article
- 10.1145/3803542
- Mar 25, 2026
- ACM Transactions on Multimedia Computing, Communications, and Applications
Recently learned image compression models have achieved better compression performance than traditional non-learning image compression standards. Those learned models usually utilize spatial self-attention and CNN to extract non-local and local features and generate the latent representation. However, previous methods adopt a linear layer to fuse non-local and local features and lack the flexibility to adaptively adjust feature weights and capture complex non-linear interactions between distinct feature representations. Additionally, how to more effectively compress the latent representation based on its channel similarity characteristics remains unexplored. To solve the above issues, we propose a novel image compression method with frequency feature interaction and non-local cross-similarity prior. More specifically, we extend the previous spatial self-attention module and alternately use spatial and channel self-attention modules to extract non-local spatial and channel features, respectively, and depth-wise convolution is utilized to extract local features. As local features focus on high-frequency detail information and non-local features concentrate on low-frequency structural information, we propose a frequency interaction module (FIM) that generates two weight maps to dynamically fuse non-local and local features. Moreover, we observe the non-local cross-similarity in different channels of the latent representation, which indicates that different channels share similar non-local semantic and structural information, but have distinct local detail information. So we design a dual transformer entropy model to emphasize non-local features and remove local features. Experiment results validate our method achieves promising compression performance on the Kodak, CLIC and Tecnick datasets.
- Conference Article
6
- 10.1109/igarss47720.2021.9553089
- Jul 11, 2021
As an important research topic in the remote sensing (RS) community, RS image scene classification is a challenging task due to the complex contents of RS images. In general, RS image scene classification is a single-label problem. Nevertheless, it is known that the contents within RS are huge in volume and diverse in type. Only a single semantic label cannot describe an RS scene completely, especially when the resolution of RS images is increased recently. The various semantics hidden in the high-resolution RS (HRRS) images are also important to the scene classification task. Taking the issues mentioned above into account, we develop a new scene classifier named graph scene classifier (GSCer) for HRRS images with the help of the deep convolution neural network (DCNN) and dynamic graph convolution (DGCN). Not only the global semantic but also the diverse hidden local semantics within an HRRS image can be fully explored. The encouraging experimental results counted on two public HRRS data sets demonstrate that our GSCer is effective in HRRS scene classification tasks.
- Research Article
22
- 10.1049/ip-vis:20040755
- Jan 1, 2004
- IEE Proceedings - Vision, Image, and Signal Processing
Medical images are widely used in the diagnosis of diseases. These imaging modalities include computerised tomography (CT), magnetic resonance imaging (MRI), ultrasonic (US) imaging, X-radiographs, etc. However, medical images have large storage requirements when high resolution is demanded; therefore, they need to be compressed to reduce the data size so as to achieve a low bit rate for transmission or storage, while maintaining image information. The Joint Photographic Experts Group (JPEG) developed an image compression tool that is one of the most widely used products for image compression. One of the factors influencing the performance of JPEG compression is the quantisation table. The bit rate and the decoded quality are determined simultaneously by the quantisation table, and therefore, the table has a strong influence on the whole compression performance. The author aims to provide a design procedure to seek sets of better quantisation parameters to raise the compression performance to achieve a lower bit rate while preserving high decoded quality. A genetic algorithm (GA) was employed to promote higher compression performance for medical images. The goal was to develop a design procedure to find quantisation tables that contribute to better compression efficiency in terms of bit rate and decoded quality. Simulations were carried out on different kinds of medical images. Resulting experimental data demonstrate that the GA-based search procedures can generate better performance than JPEG 2000 and JPEG even though the training images have different features. Additionally, if existing published quantisation tables are put into the crossover pool in the proposed GA-based system, it can improve the performance by yielding better quantisation tables.
- Research Article
22
- 10.1109/tgrs.2021.3075956
- May 17, 2021
- IEEE Transactions on Geoscience and Remote Sensing
Content-weighted compression scheme for high-resolution remote-sensing (RS) images can be well modeled by Markov random field (MRF)-oriented attention. This article addresses high-resolution RS image compression by incorporating MRF into attention mechanism. To this end, we reformulate the attention mechanism with MRF-based probabilistic graph modeling implicitly and combine the target of image compression and parameter learning of MRF in a unified framework, namely high-order MRF-oriented attention (HMA) network. Specifically, HMA extends key-value query (KVQ) pairwise terms of the vanilla attention to high-order terms, by which the prior information could be expressed effectively to boost performance of high-resolution RS image compression. It is noted that several superiorities of HMA are listed. First, unlike the vanilla attention network that apt to yield coarse features, HMA is capable of output more pleasing decoding results. Second, HMA can accelerate the convergence in the training of the deep neural networks (DNNs), thus facilitating deploying it on resource-limited IOT devices. Third, HMA demonstrates its potential of processing semantic joint task. Moreover, We thoroughly evaluate our approach on standard data sets of varying resolutions, the proposed framework performs favorably against most image coding standards and DNN-based codecs on the ISPRS Vaihingen data set and the USC-SIPI data set especially at low bit rates.