Sample Domain Prediction and Transform Skip for Region Adaptive Hierarchical Transform in Geometric Point Cloud Compression
Point cloud compression is critical for the success of immersive multimedia applications. For attribute compression in geometric point cloud compression (G-PCC), Region Adaptive Hierarchical Transform (RAHT) is the preferred coding method. Although RAHT was initially introduced as a pure transform coding tool, recent advancements have introduced intra and inter prediction for RAHT. However, these methods perform prediction in transform domain which is sub-optimal since: ${i}$) fixed-point RAHT introduces distortion to the prediction signal and $i {i}$) transforming prediction signal leads to additional decoding complexity. To address this, we propose to perform prediction in sample domain, thereby retaining crisp prediction signal and alleviating decoder of unnecessary computations. Performing prediction in sample domain opens door to completely skip the transform stage at the decoder when all the residue of a block are quantized to zero, leading to further complexity reduction. The proposed methods achieve an average chroma coding gain of around $1 \%$ and reduces the overall decoding complexity by $3-5 \%$. The method is adopted to the next version of Geometric Solid Test Model (GeS-TM v5.0) and is being evaluated on G-PCC test model TMC13v25.
- Research Article
- 10.1109/tip.2025.3565992
- Jan 1, 2025
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Point cloud compression is critical for the success of immersive multimedia applications. For attribute compression in geometric point cloud compression (G-PCC), Region Adaptive Hierarchical Transform (RAHT) is the preferred coding method. This paper presents several advances to predictive coding with RAHT: 1) Sample Domain Prediction: Prediction in RAHT is done in transform domain. This introduces undesirable distortion to the prediction signal because of fixed-point computations and leads to increased decoding complexity. We address this by naturally applying prediction in sample domain. The method opens door to skip the transform stage altogether when all residues are quantized to zero, leading to a significantly light decoder. 2) Reference Node Resampling: Inter-prediction signal derived in RAHT could have a different occupancy and weight distribution compared to the current block, causing a mismatch. To address this, we resample the reference node and align the occupancy and weight distribution. 3) Temporal Filtering: During inter-prediction, the reference node is simply copied as the prediction signal. This assumes a correlation coefficient of unity, which is barely true. We introduce a temporal filtering mechanism conditioned on the sub-band, that emulates a low-pass filtering and achieves improved prediction. 4) Inter-Eligibility: During AC inter-prediction, both encoder and decoder have access to the DC of the current and the reference nodes. We use this information to derive an inter-eligibility criterion. Experimental results show considerable gains and reduced complexity that demonstrate the utility of the proposed methods. All the presented methods have been adopted to the second version of G-PCC.
- Research Article
- 10.1109/tip.2025.3578760
- Jan 1, 2025
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Point cloud attribute compression is a challenging issue in efficiently compressing large volumes of attributes. Despite notable advancements in lossy point cloud compression using deep learning, progress in lossless compression remains limited. Some methods have employed octree- or voxel-based partitioning techniques derived from geometric compression, achieving success on dense point clouds. However, these voxel-based approaches struggle with sparse or unevenly distributed point clouds, leading to performance degradation. In this work, we introduce a novel framework for learning-based lossless point cloud attribute compression, named LOD-PCAC, which leverages a Level-of-Detail (LOD) structure to ensure density-robust compression. Specifically, the input point cloud is divided into multiple detail levels, and vertices from these levels are selected to construct a Reference Set as context, which effectively captures multi-level information. Then we propose the Bit-level Residual Coder for efficient attribute compression. Instead of directly compressing attributes, our method first predicts attribute values and organizes the residual bits into a Bit Matrix as another context, simplifying predictions and fully exploiting channel correlations. Finally, a neural network with specialized encoders processes the context to estimate the probability of each residual bit. Experimental results demonstrate that the proposed method outperforms both traditional and learning-based approaches across various point clouds, exhibiting strong generalization across datasets and robustness to varying densities.
- Research Article
6
- 10.3390/s25061660
- Mar 7, 2025
- Sensors (Basel, Switzerland)
This meta-survey provides a comprehensive review of 3D point cloud (PC) applications in remote sensing (RS), essential datasets available for research and development purposes, and state-of-the-art point cloud compression methods. It offers a comprehensive exploration of the diverse applications of point clouds in remote sensing, including specialized tasks within the field, precision agriculture-focused applications, and broader general uses. Furthermore, datasets that are commonly used in remote-sensing-related research and development tasks are surveyed, including urban, outdoor, and indoor environment datasets; vehicle-related datasets; object datasets; agriculture-related datasets; and other more specialized datasets. Due to their importance in practical applications, this article also surveys point cloud compression technologies from widely used tree- and projection-based methods to more recent deep learning (DL)-based technologies. This study synthesizes insights from previous reviews and original research to identify emerging trends, challenges, and opportunities, serving as a valuable resource for advancing the use of point clouds in remote sensing.
- Research Article
6
- 10.1109/tcsvt.2021.3129071
- Dec 1, 2021
- IEEE Transactions on Circuits and Systems for Video Technology
A point cloud is a set of 3D points that can be used to represent a 3D surface. Each point has a spatial position (x, y, z) and a vector of attributes, such as colors, material reflection, or normal. As point clouds are capable of reconstructing 3D objects or scenes, they have the potential to be widely used in various applications such as auto-driving and 6-degree virtual reality. However, the following properties of point cloud make the point cloud compression and processing become rather challenging. 1) Unstructured. The point cloud is a series of non-uniform sampled points. On the one hand, it makes the correlations among various points difficult to be utilized for compression. On the other hand, the convolutional neural network that is widely used in image/video processing cannot be applied to the point cloud processing. 2) Unordered. Unlike images and videos, the point cloud is a set of points without a specific order. Therefore, both the point cloud processing and compression algorithms need to be invariant to any permutations of the input point clouds.
- Conference Article
9
- 10.1109/icassp39728.2021.9413902
- Jun 6, 2021
With the increasing demand for 3D modeling by the emerging immersive applications, the 3D point cloud has become an essential representation format for processing 3D images and video. Because of the inherent sparsity in 3D data and the significant memory requirements for representing points, point cloud processing is a challenging task. In this paper, we propose a novel data structure for representing point clouds with a reduced memory requirement and a faster lookup than the state-of-the-art formats. The proposed format is examined for temporal encoding in geometric point cloud compression. Our simulation results show that the proposed temporal prediction enhances the compression rate and quality by 13-33% as compared to MPEG G-PCC. Moreover, the proposed data structure provides 16-54 × faster point lookup operations and more than 1.4 × reduction in memory consumption compared to the octree structure used in the MPEG G-PCC.
- Research Article
25
- 10.1109/tmm.2022.3154927
- Jan 1, 2023
- IEEE Transactions on Multimedia
With the growth of Extended Reality (XR) and capturing devices, point cloud representation has become attractive to academics and industry. Point Cloud Compression (PCC) algorithms further promote numerous XR applications that may change our daily life. However, in the literature, PCC algorithms are often evaluated with heterogeneous datasets, metrics, and parameters, making the results hard to interpret. In this article, we propose an open-source benchmark platform called PCC Arena. Our platform is modularized in three aspects: PCC algorithms, point cloud datasets, and performance metrics. Users can easily extend PCC Arena in each aspect to fulfill the requirements of their experiments. To show the effectiveness of PCC Arena, we integrate seven PCC algorithms into PCC Arena along with six point cloud datasets. We then compare the algorithms on ten carefully selected metrics to evaluate the quality of the output point clouds. We further conduct a user study to quantify the user-perceived quality of rendered images that are produced by different PCC algorithms. Several novel insights are revealed in our comparison: (i) Signal Processing (SP)-based PCC algorithms are stable for different usage scenarios, but the trade-offs between coding efficiency and quality should be carefully addressed, (ii) Neural Network (NN)-based PCC algorithms have the potential to consume lower bitrates yet provide similar results to SP-based algorithms, (iii) NN-based PCC algorithms may generate artifacts and suffer from long running time, and (iv) NN-based PCC algorithms are worth more in-depth studies as the recently proposed NN-based PCC algorithms improve the quality and running time. We believe that PCC Arena can play an essential role in allowing engineers and researchers to better interpret and compare the performance of future PCC algorithms.
- Research Article
3
- 10.1504/ijahuc.2022.121116
- Jan 1, 2022
- International Journal of Ad Hoc and Ubiquitous Computing
It is critical that autonomous navigation systems can segment the objects captured by their sensors (cameras or LiDAR scanners) in real-time. A convolutional neural networks (CNN) is proposed for real-time semantic segmentation of road objects (pedestrians, cars, cyclists) in this paper. The proposed network structure is based on the light-weight network SqueezeNet, which is small enough to be stored directly in the embedded deployment of an autonomous vehicle. The input of the proposed CNN is the transformed 3D LiDAR point cloud, and the domain transform (DT) makes the segmentation object precisely align its boundary, which results in the preferable point-wise label map as the output. In addition to comparing our segmentation results with the pipelines based on deep learning, the visual comparison with the traditional 3D point cloud segmentation pipelines is also made. Experiments show that the proposed CNN can achieve faster running time (6.2 ms per frame) and realise real-time semantic segmentation for the objects in autonomous driving scenes while ensuring the comparable segmentation accuracy.
- Research Article
2
- 10.1109/tpami.2025.3594355
- Nov 1, 2025
- IEEE transactions on pattern analysis and machine intelligence
With the maturity of 3D capture technology, the explosive growth of point cloud data has burdened the storage and transmission process. Traditional hybrid point cloud compression (PCC) tools relying on handcrafted priors have limited compression performance and are increasingly weak in addressing the burden induced by data growth. Recently, deep learning-based PCC methods have been introduced to continue to push the PCC performance boundary. With the thriving of deep PCC, the community urgently demands a systematic overview to conclude the past progress and present future research directions. In this paper, we have a detailed review that covers popular point cloud datasets, algorithm evolution, benchmarking analysis, and future trends. Concretely, we first introduce several widely-used PCC datasets according to their major properties. Then the algorithm evolution of existing studies on deep PCC, including lossy ones and lossless ones proposed for various point cloud types, is reviewed. Apart from academic studies, we also investigate the development of relevant international standards (i.e., MPEG standards and JPEG standards). To help have an in-depth understanding of the advance of deep PCC, we select a representative set of methods and conduct extensive experiments on multiple datasets. Comprehensive benchmarking comparisons and analysis reveal the pros and cons of previous methods. Finally, based on the profound analysis, we highlight the challenges and future trends of deep learning-based PCC, paving the way for further study.
- Conference Article
6
- 10.1109/dcc.2019.00058
- Mar 1, 2019
Point cloud media representation format has provided various opportunities for extended reality applications and had become widely used in volumetric content capturing scenarios. At the same time ambiguous storage format representations and network throughput are key problems for wide adoption of this media format. Compression algorithms in corresponding standard activities are aimed to solve this problem. MPEG-I standard has an aim of creating the point cloud compression methodology relying on existing video coding hardware implementations. In scope of the state-of-the-art video-based dynamic point cloud (DPC) compression method, similar 3D patches may be projected in totally different 2D positions in different frames. In this way, the motion vector predictors especially those in the patch boundary may be very inaccurate which may lead to significant bitrate increase. In this paper, we propose to use the reconstructed geometry information to help predict the motion vector more accurately and improve the coding efficiency of the attribute video. First, we propose to use the motion vector of the co-located blocks in the geometry frame as a merge candidate of the current block in the attribute frame. Second, we perform a motion estimation between the current reconstructed point cloud with only the geometry information and the reference point cloud to find the corresponding block. The motion information derived is used as motion vector predictor of the current block in the attribute frame. As far as we can see, this is the first work using the geometry information to compress the attribute in the DPC compression scenario. Significant compression efficiency is achieved with this new 3D point cloud geometry derived motion prediction scheme when compared with the state-of-the-art DPC compression method.
- Conference Article
9
- 10.1109/dcc50243.2021.00085
- Mar 1, 2021
3D point cloud has been widely applied in virtual reality and augmented reality. A complex 3D scene always needs a large number of the point cloud to represent and demands a lot of space to store. Thus, point cloud compression becomes a crucial issue to research. In this paper, we propose a novel lossy geometric compression method of autoencoder based on DCGAN optimization. This method can reconstruct a high-quality point cloud and solves a large area of missing points in the process of compression and decompression. To improve the point cloud codec performance, we propose a multi-scale 3D deconvolution hopping connection structure to obtain a better-quality reconstructed point cloud under low bit rates. Our approach is the first GAN-based point cloud compression algorithm to our knowledge. Compared with state-of-the-art methods on the MVUB dataset, our approach achieves a better rate-distortion performance and visual quality.
- Conference Article
2
- 10.1109/dicta60407.2023.00038
- Nov 28, 2023
Recently, point cloud processing is becoming popular in AI-driven areas as 3D scanners are developing rapidly. However, this kind of data can have a massive file size, causing significant file storage and transmission difficulties. Compressing point clouds is challenging due to the disordered, sparse, and irregular point cloud structures. Therefore, there is a growing need to develop effective methods to compress point clouds while preserving their information. So far, many methods based on voxel and octree structures have been reported. However, these methods suffer from the information loss issue of local details at early stages, especially during the down-sampling step. In addition, while the global attention mechanism of Transformers has strength in capturing long-range dependency features, it has limitations in capturing local geometry position details. To address these issues, we propose a Transformer-based point cloud geometric compression method with a local neighbor aggregation module to preserve local spatial features during compression. Our method is based on the architecture of the autoencoder, and a Local Neighbor Aggregation module will address the local feature-capturing limitations of the global attention and local spatial data loss in Transformers. Compared with other methods, our method achieves an average of 30.49% and 23.67% bitrate savings in terms of PSNR DI and PSNR D2 respectively with a shorter decoding time.
- Conference Article
1
- 10.1109/iceic51217.2021.9369717
- Jan 31, 2021
Geometry-based Point Cloud Compression (G-PCC) and Video-based Point Cloud Compression (V-PCC), which are standards developed by MPEG, can be used in compressing the point cloud data. However, these standards do not yet support encoding and decoding in many devices. In order to utilize the point cloud data in many devices, we conducted a study on compression and restoration of the point cloud using legacy codec. Legacy codec has already been used and verified in many devices. And if it is possible to compress and restore point cloud data by utilizing legacy codec, point cloud data can be used on many devices. In this paper, we proposed a method to design point cloud compression based on MP3. We chose MP3 for its proven, popular use, and ubiquitous support of encoder and decoder. We compared proposed method with the G-PCC reference software for performance evaluation.
- Conference Article
2
- 10.1109/ispacs51563.2021.9651108
- Nov 16, 2021
In this paper, we present a novel 3D structure-awareness image-based point cloud compression scheme, which applies the proposed Symmetry based Convolutional Neural Pyramid (SCNP) to compress colored point clouds view-by-view for 3D model transmission. Input a 3D model to the system, a preprocessing step is first applied to represent the input point cloud as a sequence of view-specific six-dimensional (6D) images, where each pixel is characterized by an RGB color vector and a XYZ 3D point. The transformed 6D images preserve the regular grid structure and thus the redundant information is easy to be removed by conventional image/video compression techniques. Our SCNP first represents each 6D image as a multiple-level pyramid structure for progressively compressing and transmission. The lowest resolution image at the highest level of the pyramid is then decomposed into multiple patches with each of them being coded as the index of a small dictionary through vector quantization. The residual images at other levels are also represented by the vector quantization codes with different patch sizes for progressively reconstructing the input colored point cloud. This process results in a multiple description coding scheme for 3D point cloud compression. With the pre-learned set of dictionaries, the projected view-specific 6D images of the input 3D model are encoded one-by-one to obtain the compressed results for 3D model transmission. In the receiver end, the 3D model is reconstructed by merging all the reconstructed point clouds where each of them is decoded from the corresponding view-specific image. Finally, the conventional 3D reconstruction approach has been applied to remove redundant 3D points for reconstructing the 3D model. Experiments demonstrate the effectiveness of our approach which attains better performance than the current state-of-the-art point cloud compression methods.
- Research Article
2
- 10.3390/electronics14071295
- Mar 25, 2025
- Electronics
As 5G technology and 3D capture techniques have been rapidly developing, there has been a remarkable increase in the demand for effectively compressing dynamic 3D point cloud data. Video-based point cloud compression (V-PCC), which is an innovative method for 3D point cloud compression, makes use of High-Efficiency Video Coding (HEVC) to carry out the compression of 3D point clouds. This is accomplished through the projection of the point clouds onto two-dimensional video frames. However, V-PCC faces significant coding complexity, particularly for dynamic 3D point clouds, which can be up to four times more complex to process than a conventional video. To address this challenge, we propose an adaptive coding unit (CU) partitioning method that integrates occupancy graphs, convolutional neural networks (CNNs), and Bayesian optimization. In this approach, the coding units (CUs) are first divided into dense regions, sparse regions, and complex composite regions by calculating the occupancy rate R of the CUs, and then an initial classification decision is made using a convolutional neural network (CNN) framework. For regions where the CNN outputs low-confidence classifications, Bayesian optimization is employed to refine the partitioning and enhance accuracy. The findings from the experiments show that the suggested method can efficiently decrease the coding complexity of V-PCC, all the while maintaining a high level of coding quality. Specifically, the average coding time of the geometric graph is reduced by 57.37%, the attribute graph by 54.43%, and the overall coding time by 54.75%. Although the BD rate slightly increases compared with that of the baseline V-PCC method, the impact on video quality is negligible. Additionally, the proposed algorithm outperforms existing methods in terms of geometric compression efficiency and computational time savings. This study’s innovation lies in combining deep learning with Bayesian optimization to deliver an efficient CU partitioning strategy for V-PCC, improving coding speed and reducing computational resource consumption, thereby advancing the practical application of V-PCC.
- Book Chapter
1
- 10.1016/b978-0-32-391755-1.00019-5
- Jan 1, 2023
- Immersive Video Technologies
Chapter 13 - Point cloud compression