3QNet
Since the development of 3D applications, the point cloud, as a spatial description easily acquired by sensors, has been widely used in multiple areas such as SLAM and 3D reconstruction. Point Cloud Compression (PCC) has also attracted more attention as a primary step before point cloud transferring and saving, where the geometry compression is an important component of PCC to compress the points geometrical structures. However, existing non-learning-based geometry compression methods are often limited by manually pre-defined compression rules. Though learning-based compression methods can significantly improve the algorithm performances by learning compression rules from data, they still have some defects. Voxel-based compression networks introduce precision errors due to the voxelized operations, while point-based methods may have relatively weak robustness and are mainly designed for sparse point clouds. In this work, we propose a novel learning-based point cloud compression framework named 3D Point Cloud Geometry Quantiation Compression Network (3QNet), which overcomes the robustness limitation of existing point-based methods and can handle dense points. By learning a codebook including common structural features from simple and sparse shapes, 3QNet can efficiently deal with multiple kinds of point clouds. According to experiments on object models, indoor scenes, and outdoor scans, 3QNet can achieve better compression performances than many representative methods.
- Conference Article
8
- 10.1109/icmew56448.2022.9859339
- Jul 18, 2022
Traditional point cloud compression (PCC) methods are not effective at extremely low bit rate scenarios because of the uniform quantization. Although learning-based PCC approaches can achieve superior compression performance, they need to train multiple models for different bit rate, which greatly increases the training complexity and memory storage. To tackle these challenges, a novel FoldingNet-based Point Cloud Geometry Compression (FN-PCGC) framework is proposed in this paper. Firstly, the point cloud is divided into several descriptions by a Multiple-Description Generation (MDG) module. Then a point-based Auto-Encoder with the Multi-scale Feature Extraction (MFE) is introduced to compress all the descriptions. Experimental results show that the proposed method outperforms the MPEG G-PCC and Draco with about 30% ~ 80% gain on average.
- Research Article
9
- 10.1109/jsen.2022.3225170
- Jan 15, 2023
- IEEE Sensors Journal
Point cloud compression is an essential task for practical applications using point clouds. Most of the previous approaches rely on octree compression which involves voxelization in the coding itself. Distortions derived from voxelization can be reduced without increasing the bitrate by postprocessing. In this article, we propose a super-resolution method for a decoded voxelized point cloud as a postprocessing step in the geometry compression. The proposed method increases the resolution of the voxelized point cloud by predicting the occupancy of higher resolution voxels than those used to compress the original point cloud. For efficient prediction, we propose a deep neural network for super-resolution based on sparse convolution. It can be highly efficient even for a large point cloud since the network applies convolution only to nonempty space. The proposed method predicts the occupancies represented by continuous values for each point and estimates the binary occupancies through a thresholding procedure. We design a dynamic threshold to ensure that at least one of all voxels is predicted to be occupied in order to prevent the generation of regions with missing points. We also introduce an occupancy prediction method to address the sparsity of high-resolution occupied voxels. Experiments on the outdoor and indoor datasets demonstrate the effectiveness of the proposed method.
- Research Article
- 10.1049/ell2.13080
- Jan 1, 2024
- Electronics Letters
The previous point cloud compression methods only consider reducing the amount of data. However, in applications such as autonomous driving, the compression methods not only require smooth transmission, but also ensure the efficiency of downstream tasks. To this end, a task‐driven sampling network based on graph convolution is proposed to achieve point cloud compression and recovery. First, a downsampling network is presented to simplify and compress the point cloud, in order to optimize the compressed point cloud for downstream tasks, the task loss is added to loss function for end‐to‐end training. Then, an upsampling network with residual correction unit is presented to recover and reconstruct the point cloud. Experiments for point cloud classification task on ModelNet40 dataset show that the compressed point cloud obtained through our network can achieve higher classification accuracy compared to other similar methods, and the reconstructed point cloud can further improve classification accuracy.
- Research Article
25
- 10.1109/tmm.2022.3154927
- Jan 1, 2023
- IEEE Transactions on Multimedia
With the growth of Extended Reality (XR) and capturing devices, point cloud representation has become attractive to academics and industry. Point Cloud Compression (PCC) algorithms further promote numerous XR applications that may change our daily life. However, in the literature, PCC algorithms are often evaluated with heterogeneous datasets, metrics, and parameters, making the results hard to interpret. In this article, we propose an open-source benchmark platform called PCC Arena. Our platform is modularized in three aspects: PCC algorithms, point cloud datasets, and performance metrics. Users can easily extend PCC Arena in each aspect to fulfill the requirements of their experiments. To show the effectiveness of PCC Arena, we integrate seven PCC algorithms into PCC Arena along with six point cloud datasets. We then compare the algorithms on ten carefully selected metrics to evaluate the quality of the output point clouds. We further conduct a user study to quantify the user-perceived quality of rendered images that are produced by different PCC algorithms. Several novel insights are revealed in our comparison: (i) Signal Processing (SP)-based PCC algorithms are stable for different usage scenarios, but the trade-offs between coding efficiency and quality should be carefully addressed, (ii) Neural Network (NN)-based PCC algorithms have the potential to consume lower bitrates yet provide similar results to SP-based algorithms, (iii) NN-based PCC algorithms may generate artifacts and suffer from long running time, and (iv) NN-based PCC algorithms are worth more in-depth studies as the recently proposed NN-based PCC algorithms improve the quality and running time. We believe that PCC Arena can play an essential role in allowing engineers and researchers to better interpret and compare the performance of future PCC algorithms.
- Research Article
18
- 10.1109/access.2020.3003753
- Jan 1, 2020
- IEEE Access
Smart monitoring, particularly at intersections, is a promising service that is being considered for the concept of smart cities. A network of light detection and ranging (LIDAR) sensors, which generates point cloud data in real time, can be used to detect people's mobility in smart monitoring. Due to the sheer volume of point cloud data, data transmission requires a significant amount of communication resources. In order to monitor people's mobility in real time, it is necessary to reduce the amount of transmission data to shorten delay. Point cloud compression is one method for reducing the amount of data. However, prior works addressing point cloud compression mainly focused on accuracy for the compression of an entire point cloud without considering its spatial characteristics. The more dynamically a spatial region changes, the more important it is when detecting moving objects such as cars, trucks, pedestrians, and bikes in smart monitoring. This paper proposes a prioritized transmission scheme that applies multiple point cloud compression methods to point cloud data according to the spatial importance of the data, i.e., how dynamically spatial regions change. This paper assumes data transmission of point cloud data from multiple LIDAR devices to an edge server and addresses the intra-frame geometry compression of point cloud data. The proposed scheme splits the point cloud into multiple classes according to the spatial importance and applies multiple point cloud compression methods to each class. A numerical study using a real point cloud dataset obtained at an intersection demonstrates the dependencies of quality, volume, and processing time on possible compression format options. The results verify that the proposed scheme reduces the amount of point cloud data drastically while satisfying the quality and processing time requirements.
- Research Article
6
- 10.3390/s25061660
- Mar 7, 2025
- Sensors (Basel, Switzerland)
This meta-survey provides a comprehensive review of 3D point cloud (PC) applications in remote sensing (RS), essential datasets available for research and development purposes, and state-of-the-art point cloud compression methods. It offers a comprehensive exploration of the diverse applications of point clouds in remote sensing, including specialized tasks within the field, precision agriculture-focused applications, and broader general uses. Furthermore, datasets that are commonly used in remote-sensing-related research and development tasks are surveyed, including urban, outdoor, and indoor environment datasets; vehicle-related datasets; object datasets; agriculture-related datasets; and other more specialized datasets. Due to their importance in practical applications, this article also surveys point cloud compression technologies from widely used tree- and projection-based methods to more recent deep learning (DL)-based technologies. This study synthesizes insights from previous reviews and original research to identify emerging trends, challenges, and opportunities, serving as a valuable resource for advancing the use of point clouds in remote sensing.
- Research Article
12
- 10.1109/tip.2023.3265264
- Jan 1, 2023
- IEEE Transactions on Image Processing
We study the use of predictive approaches alongside the region-adaptive hierarchical transform (RAHT) in attribute compression of dynamic point clouds. The use of intra-frame prediction with RAHT was shown to improve attribute compression performance over pure RAHT and represents the state-of-the-art in attribute compression of point clouds, being part of MPEG's geometry-based test model. We studied a combination of inter-frame and intra-frame prediction for RAHT for the compression of dynamic point clouds. An adaptive zero-motion-vector (ZMV) scheme and an adaptive motion-compensated scheme are developed. The simple adaptive ZMV approach is able to achieve sizable gains over pure RAHT and over the intra-frame predictive RAHT (I-RAHT) for point clouds with little or no motion while ensuring similar compression performance to I-RAHT for point clouds with intense motion. The motion-compensated approach, more complex and more powerful, is able to achieve large gains across all of the tested dynamic point clouds.
- Research Article
1
- 10.1109/tgrs.2025.3573206
- Jan 1, 2025
- IEEE Transactions on Geoscience and Remote Sensing
LiDAR point cloud (LPC) compression is an indispensable component for 3D vision tasks, especially for dynamic point clouds. However, the existing methods based on traditional spatial-temporal attention are immature, causing little improvement in inter-frame feature extraction. In this paper, we propose Diverse Attention-based Point Cloud Compression (DAPCC), an LPC compression entropy model combining aggregation embedding modules for temporal point matching and spatial-temporal attention blocks for dynamic Octree node encoding, which can effectively utilize the change information of dynamic point clouds. Specifically, we first introduce aggregation embedding to match the Octree sequences from two sweeps to establish temporal correlation. To effectively capture the feature details, we further design local and global combined attention for the spatial-temporal information of point clouds which can focus on the whole context. Finally, we organize a symmetric MLP module capable of strengthening vital features. We conduct experiments of static and dynamic compression on both indoor/outdoor point cloud benchmark datasets (<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i>, ScanNet, SemanticKITTI, and MPEG Common Test Conditions (CTC) Category 3 datasets) and downstream applications (<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i>, vehicle detection and semantic segmentation). Compared with the previous state-of-the-art methods, our method achieves up to 14.7% bpp and 45% decoding time savings and adapts to the downstream tasks with almost no impact on performance.
- Research Article
75
- 10.1109/jproc.2021.3085957
- Sep 1, 2021
- Proceedings of the IEEE
In this article, a survey of the point cloud compression (PCC) methods by organizing them with respect to the data structure, coding representation space, and prediction strategies is presented. Two paramount families of approaches reported in the literature-the projection- and octree-based methods-are proven to be efficient for encoding dense and sparse point clouds, respectively. These approaches are the pillars on which the Moving Picture Experts Group Committee developed two PCC standards published as final international standards in 2020 and early 2021, respectively, under the names: video-based PCC and geometry-based PCC. After surveying the current approaches for PCC, the technologies underlying the two standards are described in detail from an encoder perspective, providing guidance for potential standard implementors. In addition, experiment evaluations in terms of compression performances for both solutions are provided.
- Research Article
2
- 10.3390/a16100484
- Oct 19, 2023
- Algorithms
Due to the often substantial size of the real-world point cloud data, efficient transmission and storage have become critical concerns. Point cloud compression plays a decisive role in addressing these challenges. Recognizing the importance of capturing global information within point cloud data for effective compression, many existing point cloud compression methods overlook this crucial aspect. To tackle this oversight, we propose an innovative end-to-end point cloud compression method designed to extract both global and local information. Our method includes a novel Transformer module to extract rich features from the point cloud. Utilization of a pooling operation that requires no learnable parameters as a token mixer for computing long-distance dependencies ensures global feature extraction while significantly reducing both computations and parameters. Furthermore, we employ convolutional layers for feature extraction. These layers not only preserve the spatial structure of the point cloud, but also offer the advantage of parameter independence from the input point cloud size, resulting in a substantial reduction in parameters. Our experimental results demonstrate the effectiveness of the proposed TransPCGC network. It achieves average Bjontegaard Delta Rate (BD-Rate) gains of 85.79% and 80.24% compared to Geometry-based Point Cloud Compression (G-PCC). Additionally, in comparison to the Learned-PCGC network, our approach attains an average BD-Rate gain of 18.26% and 13.83%. Moreover, it is accompanied by a 16% reduction in encoding and decoding time, along with a 50% reduction in model size.
- Research Article
- 10.1016/j.jvcir.2025.104481
- Jul 1, 2025
- Journal of Visual Communication and Image Representation
MDLPCC: Misalignment-aware dynamic LiDAR point cloud compression
- Research Article
42
- 10.1109/tip.2023.3343096
- Jan 1, 2024
- IEEE Transactions on Image Processing
Efficient point cloud compression is essential for applications like virtual and mixed reality, autonomous driving, and cultural heritage. This paper proposes a deep learning-based inter-frame encoding scheme for dynamic point cloud geometry compression. We propose a lossy geometry compression scheme that predicts the latent representation of the current frame using the previous frame by employing a novel feature space inter-prediction network. The proposed network utilizes sparse convolutions with hierarchical multiscale 3D feature learning to encode the current frame using the previous frame. The proposed method introduces a novel predictor network for motion compensation in the feature domain to map the latent representation of the previous frame to the coordinates of the current frame to predict the current frame's feature embedding. The framework transmits the residual of the predicted features and the actual features by compressing them using a learned probabilistic factorized entropy model. At the receiver, the decoder hierarchically reconstructs the current frame by progressively rescaling the feature embedding. The proposed framework is compared to the state-of-the-art Video-based Point Cloud Compression (V-PCC) and Geometry-based Point Cloud Compression (G-PCC) schemes standardized by the Moving Picture Experts Group (MPEG). The proposed method achieves more than 88% BD-Rate (Bjøntegaard Delta Rate) reduction against G-PCCv20 Octree, more than 56% BD-Rate savings against G-PCCv20 Trisoup, more than 62% BD-Rate reduction against V-PCC intra-frame encoding mode, and more than 52% BD-Rate savings against V-PCC P-frame-based inter-frame encoding mode using HEVC. These significant performance gains are cross-checked and verified in the MPEG working group.
- Conference Article
7
- 10.1109/mmsp48831.2020.9287165
- Sep 21, 2020
With the rapid development of point cloud acquisition technologies, high-quality human-shape point clouds are more and more used in VR/AR applications and in general in 3D Graphics. To achieve near-realistic quality, such content usually contains an extremely high number of points (over 0.5 million points per 3D object per frame) and associated attributes (such as color). For this reason, disposing of efficient, dedicated 3D Point Cloud Compression (3DPCC) methods becomes mandatory. This requirement is even stronger in the case of dynamic content, where the coordinates and attributes of the 3D points are evolving over time. In this paper, we propose a novel skeleton-based 3DPCC approach, dedicated to the specific case of dynamic point clouds representing humanoid avatars. The method relies on a multi-view 2D human pose estimation of 3D dynamic point clouds. By using the DensePose neural network, we first extract the body parts from projected 2D images. The obtained 2D segmentation information is back-projected and aggregated into the 3D space. This procedure makes it possible to partition the 3D point cloud into a set of 3D body parts. For each part, a 3D affine transform is estimated between every two consecutive frames and used for 3D motion compensation. The proposed approach has been integrated into the Video-based Point Cloud Compression (V-PCC) test model of MPEG. Experimental results show that the proposed method, in the particular case of body motion with small amplitudes, outperforms the V-PCC test mode in the lossy inter-coding condition by up to 83% in terms of bitrate reduction in low bit rate conditions. Meanwhile, the proposed framework holds the potential of supporting various features such as regions of interests and level of details.
- Conference Article
23
- 10.1109/mmsp.2017.8122226
- Oct 1, 2017
Characterized by geometry and photometry attributes, point cloud has become widely applied in the real-time presentation of various 3D objects and scenes. The development of even more precise capture devices and the increasing requirements for vivid rendering inevitably induce huge point capacity, thus making the point cloud compression a demanding issue. Considering the non-uniform sampling and time-variant geometry, appropriate structural representation for point cloud is important. In this paper, we propose a lossless geometry compression algorithm for 3D point cloud which serves as the basis of future adaptive improvement. We utilize the binary tree structure for effectively partitioning unorganized points into block structure. This hierarchical representation obtains roughly the same quantity level for each leaf node. Further analysis is conducted on an intra-geometry prediction via extended Travelling Salesman Problem (TSP), achieving an impressive performance in eliminating point-wise redundancy while preserving one single reference position for each block. The residual encoding is accomplished via a shallow neural network-based lossless compression algorithm, PAQ. Simulation results confirm the lossless compression of geometry from high quality capture, achieving approximately 3.5 times efficiency gain over the state of art algorithm implemented as MPEG Point Cloud Compression (PCC) reference software.
- Conference Article
2
- 10.1109/ispacs51563.2021.9651108
- Nov 16, 2021
In this paper, we present a novel 3D structure-awareness image-based point cloud compression scheme, which applies the proposed Symmetry based Convolutional Neural Pyramid (SCNP) to compress colored point clouds view-by-view for 3D model transmission. Input a 3D model to the system, a preprocessing step is first applied to represent the input point cloud as a sequence of view-specific six-dimensional (6D) images, where each pixel is characterized by an RGB color vector and a XYZ 3D point. The transformed 6D images preserve the regular grid structure and thus the redundant information is easy to be removed by conventional image/video compression techniques. Our SCNP first represents each 6D image as a multiple-level pyramid structure for progressively compressing and transmission. The lowest resolution image at the highest level of the pyramid is then decomposed into multiple patches with each of them being coded as the index of a small dictionary through vector quantization. The residual images at other levels are also represented by the vector quantization codes with different patch sizes for progressively reconstructing the input colored point cloud. This process results in a multiple description coding scheme for 3D point cloud compression. With the pre-learned set of dictionaries, the projected view-specific 6D images of the input 3D model are encoded one-by-one to obtain the compressed results for 3D model transmission. In the receiver end, the 3D model is reconstructed by merging all the reconstructed point clouds where each of them is decoded from the corresponding view-specific image. Finally, the conventional 3D reconstruction approach has been applied to remove redundant 3D points for reconstructing the 3D model. Experiments demonstrate the effectiveness of our approach which attains better performance than the current state-of-the-art point cloud compression methods.