End-to-End Learned Lossy Dynamic Point Cloud Attribute Compression
Recent advancements in point cloud compression have primarily emphasized geometry compression while comparatively fewer efforts have been dedicated to attribute compression. This study introduces an end-to-end learned dynamic lossy attribute coding approach, utilizing an efficient high-dimensional convolution to capture extensive inter-point dependencies. This enables the efficient projection of attribute features into latent variables. Subsequently, we employ a context model that leverage previous latent space in conjunction with an auto-regressive context model for encoding the latent tensor into a bitstream. Evaluation of our method on widely utilized point cloud datasets from the MPEG and Microsoft demonstrates its superior performance compared to the core attribute compression module Region-Adaptive Hierarchical Transform method from MPEG Geometry Point Cloud Compression with $38.1 \%$ Bjontegaard Delta-rate saving in average while ensuring a low-complexity encoding/decoding.
- Research Article
7
- 10.1109/ojsp.2022.3160392
- Jan 1, 2022
- IEEE Open Journal of Signal Processing
The dynamic point cloud is widely needed in 3D vision related applications such as virtual reality and telepresence. Due to the huge amount of data, a key technology before the effective application is the dynamic point cloud compression. The state-of-the-art dynamic point cloud compression scheme, video-based point cloud compression (V-PCC), generates 2D videos with some uncorrelation due to the patch segmentation and packing process, which will affect the compression efficiency. In this paper, we propose a Packing with Patch Correlation Improvement (PPCI) algorithm to adaptively remove the uncorrelated parts between matched patches in packing for the sake of inter-prediction performance. We first propose a basic unidirectional patch re-segmentation operator to remove the uncorrelated parts of the patches in the current point cloud relative to the patches in its reference point cloud. The removed parts will be formed as new patches and added to the patch collection of the current point cloud. Then we propose a back-and-forth structure, which is a combination of several basic patch re-segmentation operators, to bilaterally remove the uncorrelated parts of matched patches in a back-and-forth (BF) unit. Furthermore, we propose a framework to adaptively decide the best length of each BF unit in a point cloud sequence. Experimental results show that our method achieves noticeable bitrate savings compared with the existing V-PCC packing methods, particularly for sequences with small motion.
- Conference Article
168
- 10.1109/cvpr46437.2021.00598
- Jun 1, 2021
In this paper, we propose a two-stage deep learning framework called VoxelContext-Net for both static and dynamic point cloud compression. Taking advantages of both octree based methods and voxel based schemes, our approach employs the voxel context to compress the octree structured data. Specifically, we first extract the local voxel representation that encodes the spatial neighbouring context information for each node in the constructed octree. Then, in the entropy coding stage, we propose a voxel context based deep entropy model to compress the symbols of non-leaf nodes in a lossless way. Furthermore, for dynamic point cloud compression, we additionally introduce the local voxel representations from the temporal neighbouring point clouds to exploit temporal dependency. More importantly, to alleviate the distortion from the octree construction procedure, we propose a voxel context based 3D coordinate refinement method to produce more accurate reconstructed point cloud at the decoder side, which is applicable to both static and dynamic point cloud compression. The comprehensive experiments on both static and dynamic point cloud benchmark datasets(e.g., ScanNet and Semantic KITTI) clearly demonstrate the effectiveness of our newly proposed method VoxelContext-Net for 3D point cloud geometry compression.
- Research Article
5
- 10.1016/j.displa.2023.102528
- Sep 14, 2023
- Displays
Dynamic point clouds are widely used for 3D data representation in various applications such as immersive and mixed reality, robotics and autonomous driving. However, their irregularity and large scale make efficient compression and transmission a challenge. Existing methods require high bitrates to encode point clouds since temporal correlation is not well considered. This paper proposes an end-to-end dynamic point cloud compression network that operates in latent space, resulting in more accurate motion estimation and more effective motion compensation. Specifically, a multi-scale motion estimation network is introduced to obtain accurate motion vectors. Motion information computed at a coarser level is upsampled and warped to the finer level based on cost volume analysis for motion compensation. Additionally, a residual compression network is designed to mitigate the effects of noise and inaccurate predictions by encoding latent residuals, resulting in smaller conditional entropy and better results. The proposed method achieves an average 12.09% and 14.76% (D2) BD-Rate gain over state-of-the-art Deep Dynamic Point Cloud Compression (D-DPCC) in experimental results. Compared to V-PCC, our framework showed an average improvement of 81.29% (D1) and 77.57% (D2).
- Preprint Article
- 10.52843/cassyni.sw7s3y
- Nov 7, 2024
Due to the increased popularity of augmented and virtual reality experiences, as well as 3D sensing for auto-driving, the interest in capturing high resolution real-world point clouds has grown significantly in recent years. Point cloud is a new class of signal that is non-uniform and sparse and this present unique challenges to the signal processing, compression and learning problems. In this talk, we present our multi-scale sparse convolutional learning and Graph Frourier Transform (GFT) based framework for large scale point cloud processing, with applications to the geometry and attributes super-resolution, and dynamic point cloud compression with latent space compensation. The architecture is memory efficient and can learn deep networks to handle large scale point cloud in real world applications. Initial results demonstrated that this framework achieved new state of the art results in geometry super-resolution, attributes deblocking and super-resolving, and dynamic point cloud sequence compression.
- Research Article
17
- 10.3390/s22031262
- Feb 7, 2022
- Sensors (Basel, Switzerland)
As a kind of information-intensive 3D representation, point cloud rapidly develops in immersive applications, which has also sparked new attention in point cloud compression. The most popular dynamic methods ignore the characteristics of point clouds and use an exhaustive neighborhood search, which seriously impacts the encoder’s runtime. Therefore, we propose an improved compression means for dynamic point cloud based on curvature estimation and hierarchical strategy to meet the demands in real-world scenarios. This method includes initial segmentation derived from the similarity between normals, curvature-based hierarchical refining process for iterating, and image generation and video compression technology based on de-redundancy without performance loss. The curvature-based hierarchical refining module divides the voxel point cloud into high-curvature points and low-curvature points and optimizes the initial clusters hierarchically. The experimental results show that our method achieved improved compression performance and faster runtime than traditional video-based dynamic point cloud compression.
- Research Article
60
- 10.1109/tcsvt.2020.3015901
- Aug 18, 2020
- IEEE Transactions on Circuits and Systems for Video Technology
As 3D scanning devices and depth sensors advance, dynamic point clouds have attracted increasing attention as a format for 3D objects in motion, with applications in various fields such as immersive telepresence, navigation for autonomous driving and gaming. Nevertheless, the tremendous amount of data in dynamic point clouds significantly burden transmission and storage. To this end, we propose a complete compression framework for attributes of 3D dynamic point clouds, focusing on optimal inter-coding. Firstly, we derive the optimal inter-prediction and predictive transform coding assuming the Gaussian Markov Random Field model with respect to a spatio-temporal graph underlying the attributes of dynamic point clouds. The optimal predictive transform proves to be the Generalized Graph Fourier Transform in terms of spatio-temporal decorrelation. Secondly, we propose refined motion estimation via efficient registration prior to inter-prediction, which searches the temporal correspondence between adjacent frames of irregular point clouds. Finally, we present a complete framework based on the optimal inter-coding and our previously proposed intra-coding, where we determine the optimal coding mode from rate-distortion optimization with the proposed offline-trained λ-Q model. Experimental results show that we achieve around 17% bit rate reduction on average over competitive dynamic point cloud compression methods.
- Research Article
10
- 10.3929/ethz-a-006731956
- Jan 1, 2003
- Repository for Publications and Research Data (ETH Zurich)
In this paper, we present a coding framework addressing the compression of dynamic 3D point clouds which represent real world objects and which result from a video acquisiton using multiple cameras. The encoding is performed as an off-line process and is not time-critical. The decoding however, must allow for real-time rendering of the dynamic 3D point cloud. We introduce a compression framework which encodes multiple attributes like depth and color of 3D video fragments into progressive streams. The reference data structure is alighned on the original camera input images and thus allows for easy view-dependent decoding. The separate encoding of the object´ s silhouette allows the use of shape-adaptive compression algorithms. A novel differential coding approach permits random access in constant time throughout the complete data set and thus enables true free viewpoint video.
- Conference Article
33
- 10.24963/ijcai.2022/126
- Jul 1, 2022
The non-uniformly distributed nature of the 3D Dynamic Point Cloud (DPC) brings significant challenges to its high-efficient inter-frame compression. This paper proposes a novel 3D sparse convolution-based Deep Dynamic Point Cloud Compression (D-DPCC) network to compensate and compress the DPC geometry with 3D motion estimation and motion compensation in the feature space. In the proposed D-DPCC network, we design a Multi-scale Motion Fusion (MMF) module to accurately estimate the 3D optical flow between the feature representations of adjacent point cloud frames. Specifically, we utilize a 3D sparse convolution-based encoder to obtain the latent representation for motion estimation in the feature space and introduce the proposed MMF module for fused 3D motion embedding. Besides, for motion compensation, we propose a 3D Adaptively Weighted Interpolation (3DAWI) algorithm with a penalty coefficient to adaptively decrease the impact of distant neighbours. We compress the motion embedding and the residual with a lossy autoencoder-based network. To our knowledge, this paper is the first work proposing an end-to-end deep dynamic point cloud compression framework. The experimental result shows that the proposed D-DPCC framework achieves an average 76% BD-Rate (Bjontegaard Delta Rate) gains against state-of-the-art Video-based Point Cloud Compression (V-PCC) v13 in inter mode.
- Research Article
- 10.1109/tip.2025.3648141
- Jan 1, 2026
- IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
Point clouds have gained prominence across numerous applications due to their ability to accurately represent 3D objects and scenes. However, efficiently compressing unstructured, high-precision point cloud data remains a significant challenge. In this paper, we propose NeRC ${}^{\textbf {3}}$ , a novel point cloud compression framework that leverages implicit neural representations (INRs) to encode both geometry and attributes of dense point clouds. Our approach employs two coordinate-based neural networks: one maps spatial coordinates to voxel occupancy, while the other maps occupied voxels to their attributes, thereby implicitly representing the geometry and attributes of a voxelized point cloud. The encoder quantizes and compresses network parameters alongside auxiliary information required for reconstruction, while the decoder reconstructs the original point cloud by inputting voxel coordinates into the neural networks. Furthermore, we extend our method to dynamic point cloud compression through techniques that reduce temporal redundancy, including a 4D spatio-temporal representation termed 4D-NeRC ${}^{\textbf {3}}$ . Experimental results validate the effectiveness of our approach: For static point clouds, NeRC ${}^{\textbf {3}}$ outperforms octree-based G-PCC standard and existing INR-based methods. For dynamic point clouds, 4D-NeRC ${}^{\textbf {3}}$ achieves superior geometry compression performance compared to the latest G-PCC and V-PCC standards, while matching state-of-the-art learning-based methods. It also demonstrates competitive performance in joint geometry and attribute compression.
- Conference Article
15
- 10.1109/icip42928.2021.9506333
- Sep 19, 2021
Point cloud in its uncompressed format require very high data rate for storage and transmission. The video based point cloud compression (V-PCC) technique projects a dynamic point cloud into geometry and texture video sequences. The projected geometry and texture video frames are then encoded using modern video coding standard like HEVC. However, HEVC encoder is unable to exploit the global commonality that exists within a geometry frame and between successive geometry frames to a greater extent. This is because in HEVC, the current frame partitioning starts from a rigid 64 × 64 pixels level without considering the structure of the scene need be coded. In this paper, an improved commonality modeling framework is proposed, by leveraging on cuboid-based frame partitioning, to encode point cloud geometry frames. The associated frame-partitioning scheme is based on statistical properties of the current geometry frame and therefore yields a flexible block partitioning structure composed of cuboids. Additionally, the proposed commonality modeling approach is computationally efficient and has a compact representation. Experimental results show that if the V-PCC reference encoder is augmented by the proposed commonality modeling technique, a bit rate savings of 2.71% and 4.25% are achieved for full body and upper body of human point clouds’ geometry sequences respectively. © 2021 IEEE.
- Research Article
23
- 10.3390/s24103142
- May 15, 2024
- Sensors (Basel, Switzerland)
The substantial data volume within dynamic point clouds representing three-dimensional moving entities necessitates advancements in compression techniques. Motion estimation (ME) is crucial for reducing point cloud temporal redundancy. Standard block-based ME schemes, which typically utilize the previously decoded point clouds as inter-reference frames, often yield inaccurate and translation-only estimates for dynamic point clouds. To overcome this limitation, we propose an advanced patch-based affine ME scheme for dynamic point cloud geometry compression. Our approach employs a forward-backward jointing ME strategy, generating affine motion-compensated frames for improved inter-geometry references. Before the forward ME process, point cloud motion analysis is conducted on previous frames to perceive motion characteristics. Then, a point cloud is segmented into deformable patches based on geometry correlation and motion coherence. During the forward ME process, affine motion models are introduced to depict the deformable patch motions from the reference to the current frame. Later, affine motion-compensated frames are exploited in the backward ME process to obtain refined motions for better coding performance. Experimental results demonstrate the superiority of our proposed scheme, achieving an average 6.28% geometry bitrate gain over the inter codec anchor. Additional results also validate the effectiveness of key modules within the proposed ME scheme.
- Conference Article
9
- 10.1109/iscas51556.2021.9401619
- May 1, 2021
The video-based point cloud compression (V-PCC) is the state-of-the-art dynamic point cloud compression technique. V-PCC projects the 3D point cloud data patch by patch to its bounding box and organizes projected patches into a video frame, making full use of the well-developed video coding tools. Despite its high efficiency, cracks easily exist in the reconstructed point cloud in various viewing angles, which seriously degrades the visual quality. In this paper, we propose an efficient method to improve the visual quality of dynamic point cloud, especially for the main view from the content provider. The relationship between patches and views is exploited, and an algorithm intelligently reserving points that may be discarded in V-PCC is proposed. According to our subjective and perceptual objective evaluation experiments, compared with V-PCC, the overall visual quality of the reconstructed point could is evidently improved. In particular, cracks are mended with our proposed method. The Bjontegaard delta bit-rate reduction of up to 3.1% is achieved with respect to Point Cloud Quality Metric (PCQM), which partially verifies the improvement of subjective quality when adopting the proposed method.
- Research Article
54
- 10.1017/atsip.2018.15
- Jan 1, 2018
- APSIPA Transactions on Signal and Information Processing
We introduce the polygon cloud, a compressible representation of three-dimensional geometry (including attributes, such as color), intermediate between polygonal meshes and point clouds. Dynamic polygon clouds, like dynamic polygonal meshes and dynamic point clouds, can take advantage of temporal redundancy for compression. In this paper, we propose methods for compressing both static and dynamic polygon clouds, specifically triangle clouds. We compare triangle clouds to both triangle meshes and point clouds in terms of compression, for live captured dynamic colored geometry. We find that triangle clouds can be compressed nearly as well as triangle meshes, while being more robust to noise and other structures typically found in live captures, which violate the assumption of a smooth surface manifold, such as lines, points, and ragged boundaries. We also find that triangle clouds can be used to compress point clouds with significantly better performance than previously demonstrated point cloud compression methods. For intra-frame coding of geometry, our method improves upon octree-based intra-frame coding by a factor of 5–10 in bit rate. Inter-frame coding improves this by another factor of 2–5. Overall, our proposed method improves over the previous state-of-the-art in dynamic point cloud compression by 33% or more.
- Conference Article
7
- 10.1109/euvip.2018.8611760
- Nov 1, 2018
With the recent improvements in acquisition techniques for 3D media applications, it has become easier to collect 3D data, for example, dynamic point cloud data. Such point clouds consist of a large amount of 3D coordinates, which describe a scene or object in 3D space by its geometry and texture attributes. Moreover, they are an effective representation of 3D environments for applications such as Augmented Reality or Virtual Reality. One of the main problems for such data is that the number of points is typically too large to allow for real-time transmission or efficient storage. Thus, compressing such 3D data is a key issue to reduce the amount of required bandwidth or memory. This paper presents a method for efficient compression of dynamic point cloud data within the current MPEG standardization framework for dynamic point cloud compression. The key benefit of the presented work is the reduced number of encoded and decoded 3D points compared to the reference framework, thus encoding and decoding complexity is reduced significantly. Objective results show a speed-up of around 35-40% in coding times. Furthermore, reconstruction quality is preserved, thus reducing bit rate requirements by up to 30%. Visual results verify the improved reconstruction quality, and compared to the reference at the same computational complexity, coding efficiency is improved by over 40%.
- Conference Article
12
- 10.1109/icassp39728.2021.9414171
- Jun 6, 2021
Immersive media representation format based on point clouds has underpinned significant opportunities for extended reality applications. Point cloud in its uncompressed format require very high data rate for storage and transmission. The video based point cloud compression technique projects a dynamic point cloud into geometry and texture video sequences. The projected texture video is then coded using modern video coding standard like HEVC. Since the properties of projected texture video frames are different from traditional video frames, HEVC-based commonality modeling can be inefficient. An improved commonality modeling technique is proposed that employs discrete cosine basis oriented motion models and the domains of such models are approximated by homogeneous regions called cuboids. Experimental results show that the proposed commonality modeling technique can yield savings in bit rate of up to 4.17%.