A Study on the Effect of Color Spaces in Learned Image Compression
In this work, we present a comparison between color spaces namely YUV, LAB, RGB and their effect on learned image compression. For this we use the structure and color based learned image codec (SLIC) from our prior work, which consists of two branches -one for the luminance component (Y or L) and another for chrominance components (UV or AB). However, for the RGB variant we input all 3 channels in a single branch, similar to most learned image codecs operating in RGB. The models are trained for multiple bitrate configurations in each color space. We report the findings from our experiments by evaluating them on various datasets and compare the results to state-of-the-art image codecs. The YUV model performs better than the LAB variant in terms of MS-SSIM with a Bjøntegaard delta bitrate (BD-BR) gain of 7.5% using VTM intra-coding mode as the baseline. Whereas the LAB variant has a better performance than YUV model in terms of CIEDE2000 having a BD-BR gain of 8%. Overall, the RGB variant of SLIC achieves the best performance with a BD-BR gain of 13.14% in terms of MS-SSIM and a gain of 17.96% in CIEDE2000 at the cost of a higher model complexity.
- Research Article
2
- 10.5194/isprs-annals-x-4-w4-2024-201-2024
- May 31, 2024
- ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Abstract. Precise road segmentation is an essential part of many applications related to road information extraction from remote sensing data. The effect of color space on road detection has rarely been studied. In this paper, the effects of different color spaces of aerial images and multitask learning methods were experimented on road segmentation using three deep convolutional neural networks, UNet, DenseU-Net, and RoadVecNet. The color spaces included RGB, HSV, LAB, YCbCr, and YUV. The multitask learning methods adopted in this study involved utilizing multiple inputs, and multiple outputs. Multiple inputs were aerial images from the same area with different color spaces, and multiple outputs were road segmentation and road outline segmentation. As remote sensing data, National Land Survey of Finland’s true orthophotos (from 2020), Massachusetts road imagery dataset, and Ottawa dataset were applied. Segmentation masks for National Land Survey of Finland’s true orthophotos were extracted from Digiroad vectors with road width information. Road outline masks were generated from the segmentation masks. The studied neural networks were trained with the same data, learning rate, loss function, and optimizer for each color space, and pairs of color spaces. Multiple outputs were experimented with RGB color space. The comparative analysis assessed the performance of various neural networks across different color spaces using the F1-score metric. The experimental findings indicate that the choice of color space has little influence on the results of neural networks Deep learning methods can adapt to different color spaces well. In addition, the use of sharpening and edge enhancement augmentations had a slight effect on the results.
- Research Article
1
- 10.31590/ejosat.1013341
- Dec 8, 2021
- European Journal of Science and Technology
Dünya Sağlık Örgütü'ne göre kısırlık; çiftlerin herhangi bir koruma olmaksızın bir yıl boyunca cinsel ilişkiye girmelerine rağmen gebeliğin oluşmama durumu olarak tanımlanır. Kısırlığın nedeni erkek ve/veya kadın faktörleri olabilir. Erkek faktörlerin teşhisinde, laboratuvar ortamında belirli koşullar altında sperm hücrelerinin analizi yapılır. Spermiyogram adı verilen analizde spermin morfolojik anormalliği, karakteristik motilitesi ve konsantrasyonu incelenir. Spermiogram testleri doktorlar tarafından manuel olarak yapılabileceği gibi bilgisayar destekli sperm analiz sistemleri kullanılarak da yapılabilmektedir. Görsel incelemenin kişiden kişiye farklı sonuçlar vermesi ve maliyetli olması nedeniyle bilgisayar destekli analizlerin önemi her geçen gün artmaktadır. Bu çalışmada, sperm morfolojisi için bilgisayar tabanlı bir analiz yaklaşımının sınıflandırma performansını artırmak için bir ön işleme adımı olarak farklı renk uzaylarının etkisi araştırılmıştır. Deneysel testlerde SMIDS, HuSHeM ve SCIAN-Morpho olarak kısaltılan üç sperm morfolojisi veri seti kullanılmıştır. Sperm görüntülerinin sınıflar arasındaki dengesiz dağılımı ve yetersiz veri nedeniyle veri setleri üzerinde veri artırma işlemi uygulanmıştır. Daha sonra, renk uzayının sınıflandırmadaki etkilerini gözlemlemek için veri setleri çok iyi bilinen iki renk uzayı olan LAB ve HSV formatlarına dönüştürülmüştür. Sınıflandırma modeli olarak MobileNetV2 kullanılmıştır. Renk uzaylarının etkilerini göstermek için sonuçlar, renk dönüşümünün uygulanmadığı daha önce yayınlanmış çalışma ile karşılaştırılmıştır. LAB ve HSV renk uzaylarında görüntülerin sınıflandırılması, aynı koşullar altında eğitilmiş RGB görüntülerinden daha iyi sonuçlar vermiştir. Renk uzayı dönüşümleri kullanılarak SMIDS, HuSHeM, SCIAN-Morpho veri setleri için sırasıyla %89, %85 ve %68 maksimum sınıflandırma doğruluğu elde edilmiştir.
- Research Article
16
- 10.1109/tcsvt.2022.3229701
- Jun 1, 2023
- IEEE Transactions on Circuits and Systems for Video Technology
Progressive coding is essential to the practical deployment of learned image compression over heterogeneous networks and clients. Existing methods for learned progressive image compression require complex and empirical design to achieve near-optimal rate-distortion performance over a wide range of bit-rates. However, these methods are limited by the implicit learned mechanism based on neural networks and introduction of uniform quantizers. In this paper, we propose generalized learned progressive image compression with analytic rate-distortion optimization using dead-zone quantizers on the latent representation. Specifically, we reveal that dead-zone quantizers, as a general case of uniform quantizers, are equivalent to uniform quantizers in fixed-rate nonlinear transform coding and can prevent extra redundancy in embedded quantization for progressive coding. Consequently, we propose rate-distortion optimized learned progressive coding by approximating the optimal quantizer in the source spaces using dead-zone quantizers in an analytic manner on the Laplacian source. To our best knowledge, this paper is the first to achieve general learned progressive coding from the perspective of optimal quantizers. The proposed method achieves theoretically sound and practically efficient embedded quantization and learned progressive coding of latent representations with improved rate-distortion performance. It can also enable embedded quantization with diverse assignments of truncation points and support flexible configuration of quality layers of varying numbers and at varying target bit-rates. Furthermore, we successfully incorporate the proposed method into existing pre-trained fixed-rate models to realize progressive learned image compression without re-training. Experimental results demonstrate that the proposed method achieves state-of-the-art rate-distortion performance in learned progressive image compression compared with traditional codecs and recent learned methods.
- Research Article
65
- 10.1109/tcsvt.2021.3119660
- Jun 1, 2022
- IEEE Transactions on Circuits and Systems for Video Technology
Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications. First, parallel acceleration of the autoregressive entropy model cannot be achieved due to serial decoding. Second, full-resolution inference often causes the out-of-memory (OOM) problem with limited GPU resources, especially for high-resolution images. Block partition is a good choice to handle the above issues, but it brings about new challenges in reducing the redundancy between blocks and eliminating block effects. To tackle the above challenges, this paper provides a learned block-based hybrid image compression (LBHIC) framework. Specifically, we introduce explicit intra prediction into a learned image compression framework to utilize the relation among adjacent blocks. Superior to context modeling by linear weighting of neighbor pixels in traditional codecs, we propose a contextual prediction module (CPM) to better capture long-range correlations by utilizing the strip pooling to extract the most relevant information in neighboring latent space, thus achieving effective information prediction. Moreover, to alleviate blocking artifacts, we further propose a boundary-aware postprocessing module (BPM) with the edge importance taken into account. Extensive experiments demonstrate that the proposed LBHIC codec outperforms the VVC, with a bit-rate conservation of 4.1%, and reduces the decoding time by approximately 86.7% compared with that of state-of-the-art learned image compression methods.
- Research Article
16
- 10.3390/rs15082211
- Apr 21, 2023
- Remote Sensing
Learned image compression has achieved a series of breakthroughs for nature images, but there is little literature focusing on high-resolution remote sensing image (HRRSI) datasets. This paper focuses on designing a learned lossy image compression framework for compressing HRRSIs. Considering the local and non-local redundancy contained in HRRSI, a mixed hyperprior network is designed to explore both the local and non-local redundancy in order to improve the accuracy of entropy estimation. In detail, a transformer-based hyperprior and a CNN-based hyperprior are fused for entropy estimation. Furthermore, to reduce the mismatch between training and testing, a three-stage training strategy is introduced to refine the network. In this training strategy, the entire network is first trained, and then some sub-networks are fixed while the others are trained. To evaluate the effectiveness of the proposed compression algorithm, the experiments are conducted on an HRRSI dataset. The results show that the proposed algorithm achieves comparable or better compression performance than some traditional and learned image compression algorithms, such as Joint Photographic Experts Group (JPEG) and JPEG2000. At a similar or lower bitrate, the proposed algorithm is about 2 dB higher than the PSNR value of JPEG2000.
- Research Article
- 10.1609/aaai.v39i10.33100
- Apr 11, 2025
- Proceedings of the AAAI Conference on Artificial Intelligence
Learned image compression (LIC) has achieved state-of-the-art rate-distortion performance, deemed promising for next-generation image compression techniques. However, pre-trained LIC models usually suffer from significant performance degradation when applied to out-of-training-domain images, implying their poor generalization capabilities. To tackle this problem, we propose a few-shot domain adaptation method for LIC by integrating plug-and-play adapters into pre-trained models. Drawing inspiration from the analogy between latent channels and frequency components, we examine domain gaps in LIC and observe that out-of-training-domain images disrupt pre-trained channel-wise decomposition. Consequently, we introduce a method for channel-wise re-allocation using convolution-based adapters and low-rank adapters, which are lightweight and compatible to mainstream LIC schemes. Extensive experiments across multiple domains and multiple representative LIC schemes demonstrate that our method significantly enhances pre-trained models, achieving comparable performance to H.266/VVC intra coding with merely 25 target-domain samples. Additionally, our method matches the performance of full-model finetune while transmitting fewer than 2% of the parameters.
- Conference Article
20
- 10.1109/icip40778.2020.9190805
- Oct 1, 2020
Learned image compression (LIC) has reached the traditional hand-crafted methods such as JPEG2000 and BPG in terms of the coding gain. However, the large model size of the network prohibits the usage of LIC on resource-limited embedded systems. This paper presents a LIC with 8-bit fixed-point weights. First, we quantize the weights in groups and propose a non-linear memory-free codebook. Second, we explore the optimal grouping and quantization scheme. Finally, we develop a novel weight clipping fine tuning scheme. Experimental results illustrate that the coding loss caused by the quantization is small, while around 75% model size can be reduced compared with the 32-bit floating-point anchor. As far as we know, this is the first work to explore and evaluate the LIC fully with fixed-point weights, and our proposed quantized LIC is able to outperform BPG in terms of MS-SSIM.
- Conference Article
10
- 10.1109/a-sscc56115.2022.9980666
- Nov 6, 2022
Recently, learned image compression (LIC) has shown a superior ability in the compression ratio as well as the quality of the reconstructed image. By adopting the framework of variational autoencoder, LIC [1] can outperform the intra prediction of the latest traditional coding standard VVC. To accelerate the coding speed, most LIC frameworks are operated on GPU with the floating-point arithmetic. However, the mismatch of floating-point calculation results on various hardware platforms will cause the decoding error if encoding and decoding are performed on different platforms. Therefore, LIC with a fixed-point arithmetic [2–3] is highly required. This paper gives an FPGA design for a LIC with 8-bit fixed-point quantization. Different from existing FPGA accelerators [4–6], we propose a fine-grained pipeline architecture to realize high DSP efficiency. Cascading DSP and the deconvolution with zero skipping are also developed to enhance the hardware performance.
- Research Article
7
- 10.1016/j.neucom.2022.07.065
- Jul 22, 2022
- Neurocomputing
Successive learned image compression: Comprehensive analysis of instability
- Conference Article
9
- 10.1109/pcs50896.2021.9477479
- Jun 1, 2021
Rate-distortion optimization (RDO) of codecs, where distortion is quantified by the mean-square error, has been a standard practice in image/video compression over the years. RDO serves well for optimization of codec performance for evaluation of the results in terms of PSNR. However, it is well known that the PSNR does not correlate well with perceptual evaluation of images; hence, RDO is not well suited for perceptual optimization of codecs. Recently, rate-distortion-perception trade-off has been formalized by taking the Kullback-Leibler (KL) divergence between the distributions of the original and reconstructed images as a perception measure. Learned image compression methods that simultaneously optimize rate, mean-square loss, VGG loss, and an adversarial loss were proposed. Yet, there exists no easy approach to fix the rate, distortion or perception at a desired level in a practical learned image compression solution to perform an analysis of the trade-off between rate, distortion and perception measures. In this paper, we propose a practical approach to fix the rate to carry out perception-distortion analysis at a fixed rate in order to perform perceptual evaluation of image compression results in a principled manner. Experimental results provide several insights for practical rate-distortion-perception analysis in learned image compression.
- Book Chapter
2
- 10.1007/978-3-031-19839-7_16
- Jan 1, 2022
In Cloud 3D, such as Cloud Gaming and Cloud Virtual Reality (VR), image frames are rendered and compressed (encoded) in the cloud, and sent to the clients for users to view. For low latency and high image quality, fast, high compression rate, and high-quality image compression techniques are preferable. This paper explores computation time reduction techniques for learned image compression to make it more suitable for cloud 3D. More specifically, we employed slim (low-complexity) and application-specific AI models to reduce the computation time without degrading image quality. Our approach is based on two key insights: (1) as the frames generated by a 3D application are highly homogeneous, application-specific compression models can improve the rate-distortion performance over a general model; (2) many computer-generated frames from 3D applications are less complex than natural photos, which makes it feasible to reduce the model complexity to accelerate compression computation. We evaluated our models on six gaming image datasets. The results show that our approach has similar rate-distortion performance as a state-of-the-art learned image compression algorithm, while obtaining about 5x to 9x speedup and reducing the compression time to be less than 1 s (0.74s), bringing learned image compression closer to being viable for cloud 3D. Code is available at https://github.com/cloud-graphics-rendering/AppSpecificLIC.KeywordsCloud gamingCloud virtual realityLearned image compressionModel simplificationApplication-specific modelingModel-task balance
- Conference Article
339
- 10.1109/cvpr52688.2022.00563
- Jun 1, 2022
Recently, learned image compression techniques have achieved remarkable performance, even surpassing the best manually designed lossy image coders. They are promising to be large-scale adopted. For the sake of practicality, a thorough investigation of the architecture design of learned image compression, regarding both compression performance and running speed, is essential. In this paper, we first propose uneven channel-conditional adaptive coding, motivated by the observation of energy compaction in learned image compression. Combining the proposed uneven grouping model with existing context models, we obtain a spatial-channel contextual adaptive model to improve the coding performance without damage to running speed. Then we study the structure of the main transform and propose an efficient model, ELIC, to achieve state-of-the-art speed and compression ability. With superior performance, the proposed model also supports extremely fast preview decoding and progressive decoding, which makes the coming application of learning-based image compression more promising.
- Conference Article
111
- 10.1109/cvpr.2019.01031
- Jun 1, 2019
Compression has been an important research topic for many decades, to produce a significant impact on data transmission and storage. Recent advances have shown a great potential of learning image and video compression. Inspired from related works, in this paper, we present an image compression architecture using a convolutional autoencoder, and then generalize image compression to video compression, by adding an interpolation loop into both encoder and decoder sides. Our basic idea is to realize spatial-temporal energy compaction in learning image and video compression. Thereby, we propose to add a spatial energy compaction-based penalty into loss function, to achieve higher image compression performance. Furthermore, based on temporal energy distribution, we propose to select the number of frames in one interpolation loop, adapting to the motion characteristics of video contents. Experimental results demonstrate that our proposed image compression outperforms the latest image compression standard with MS-SSIM quality metric, and provides higher performance compared with state-of-the-art learning compression methods at high bit rates, which benefits from our spatial energy compaction approach. Meanwhile, our proposed video compression approach with temporal energy compaction can significantly outperform MPEG-4 and is competitive with commonly used H.264. Both our image and video compression can produce more visually pleasant results than traditional standards.
- Conference Article
5
- 10.1049/cp.2013.2616
- Jan 1, 2013
Today, enormous amount of multimedia data is generated, transmitted and stored on the internet which has opened new research dimensions for computing field. In recent work, hybrid wavelet transforms (HWT) generated with various constituent transforms is proven to be better than individual orthogonal transforms [7]. Later work as proved that HWT generated with varying proportions of constituent transforms gives better compression quality as compared to equal proportions of constituent orthogonal transforms in HWT depending upon the compression ratio [1, 2]. Here the appraise of the effect of color spaces on image compression using HWT generated with varying proportions of constituent transforms and constituent transforms is presented. The experimentation is done on the test bed having 15 images of varied sizes and eight compression ratios (60% to 95%). The results show that for higher compression ratio of 95%, the LUV color space gives better compression quality as compared to other considered color spaces with HWT generated with 4:1 proportion of Cosine- Sine constituent transforms. For 65% to 90% compression ratios, HWT generated from 1:1 proportion of Cosine-Kekre constituent transform with RGB color space gives less average mean square error (MSE). For lower compression ratio of 60%, HWT generated with 1:4 proportion of Cosine-Kekre constituent transform gives better image compression quality with RGB color space.
- Conference Article
3
- 10.1109/icip40778.2020.9190974
- Oct 1, 2020
With recent advances in learned entropy and context models, the rate-distortion performance of deep learned image compression methods reached or surpassed those of conventional codecs. However, learned image compression is currently more complex and slower than conventional image compression. Learned image and video compression methods almost exclusively employ the generalized divisive normalization (GDN) activation function. This paper investigates the effect of activation function on the performance of image compression in terms of both objective and subjective criteria as well as runtime. In particular, we show that the distribution of latents produced by hard shrinkage fits a Laplacian better, and it is possible to achieve similar rate-distortion and better visual performance using hard shrinkage with lower complexity.