Mesh Coding Extensions to MPEG-I V-PCC
Dynamic point clouds and meshes are used in a wide variety of applications such as gaming, visualization, medicine, and more recently AR/VR/MR. This paper presents two extensions of MPEG-I Video-based Point Cloud Compression (V-PCC) standard to support mesh coding. The extensions are based on Edgebreaker and TFAN mesh connectivity coding algorithms implemented in the Google Draco software and the MPEG SC3DMC software for mesh coding, respectively. Lossless results for the proposed frameworks on top of version 8.0 of the MPEG-I V-PCC test model (TMC2) are presented and compared with Draco for dense meshes.
- Research Article
73
- 10.1109/tmm.2020.3016894
- Aug 17, 2020
- IEEE Transactions on Multimedia
The state-of-the-art 2D-based dynamic point cloud (DPC) compression algorithm is the video-based point cloud compression (V-PCC) developed by the Moving Pictures Experts Group (MPEG). It first projects the DPC patch by patch from 3D to 2D and organizes the projected patches into a video. The video is then efficiently compressed by High Efficiency Video Coding. However, there are many unoccupied pixels that may have a significant influence on the coding efficiency. These unoccupied pixels are currently padded using either the average of 4-neighbors for the geometry or the push-pull algorithm for the color attribute. While these algorithms are simple, the unoccupied pixels are not handled in the most efficient way. In this paper, we divide the unoccupied pixels into two groups: those that should be occupied and those that should not be occupied according to the occupancy map. We then design padding algorithms tailored to each group to improve the rate-distortion performance of the V-PCC reference software, for both the geometry and the color attribute. The first group is the unoccupied pixels that should be occupied according to the block-based occupancy map. We attempt to pad those pixels using the real points in the original DPC to improve the quality of the reconstructed DPC. Additionally, we attempt to maintain the smoothness of each block so as not to negatively influence the video compression efficiency. The second group is the unoccupied pixels that were correctly identified as unoccupied according to the block-based occupancy map. These pixels are useless for improving the reconstructed quality of the DPC. Therefore, we attempt to minimize the bit cost of these pixels without considering their reconstruction qualities. The bit cost is determined by the residue of these pixels obtained by subtracting the prediction pixels from the original pixels. Therefore, we propose padding the residue using the average residue of the occupied pixels in order to minimize the bit cost. The proposed algorithms are implemented in the V-PCC and the corresponding HEVC reference software. The experimental results show the proposed algorithms can bring significant bitrate savings compared with the V-PCC.
- Research Article
8
- 10.3390/s23125623
- Jun 15, 2023
- Sensors (Basel, Switzerland)
This article describes an empirical exploration on the effect of information loss affecting compressed representations of dynamic point clouds on the subjective quality of the reconstructed point clouds. The study involved compressing a set of test dynamic point clouds using the MPEG V-PCC (Video-based Point Cloud Compression) codec at 5 different levels of compression and applying simulated packet losses with three packet loss rates (0.5%, 1% and 2%) to the V-PCC sub-bitstreams prior to decoding and reconstructing the dynamic point clouds. The recovered dynamic point clouds qualities were then assessed by human observers in experiments conducted at two research laboratories in Croatia and Portugal, to collect MOS (Mean Opinion Score) values. These scores were subject to a set of statistical analyses to measure the degree of correlation of the data from the two laboratories, as well as the degree of correlation between the MOS values and a selection of objective quality measures, while taking into account compression level and packet loss rates. The subjective quality measures considered, all of the full-reference type, included point cloud specific measures, as well as others adapted from image and video quality measures. In the case of image-based quality measures, FSIM (Feature Similarity index), MSE (Mean Squared Error), and SSIM (Structural Similarity index) yielded the highest correlation with subjective scores in both laboratories, while PCQM (Point Cloud Quality Metric) showed the highest correlation among all point cloud-specific objective measures. The study showed that even 0.5% packet loss rates reduce the decoded point clouds subjective quality by more than 1 to 1.5 MOS scale units, pointing out the need to adequately protect the bitstreams against losses. The results also showed that the degradations in V-PCC occupancy and geometry sub-bitstreams have significantly higher (negative) impact on decoded point cloud subjective quality than degradations of the attribute sub-bitstream.
- Research Article
16
- 10.1109/tip.2023.3327003
- Jan 1, 2023
- IEEE Transactions on Image Processing
Dynamic point cloud is a volumetric visual data representing realistic 3D scenes for virtual reality and augmented reality applications. However, its large data volume has been the bottleneck of data processing, transmission, and storage, which requires effective compression. In this paper, we propose a Perceptually Weighted Rate-Distortion Optimization (PWRDO) scheme for Video-based Point Cloud Compression (V-PCC), which aims to minimize the perceptual distortion of reconstructed point cloud at the given bit rate. Firstly, we propose a general framework of perceptually optimized V-PCC to exploit visual redundancies in point clouds. Secondly, a multi-scale Projection based Point Cloud quality Metric (PPCM) is proposed to measure the perceptual quality of 3D point cloud. The PPCM model comprises 3D-to-2D patch projection, multi-scale structural distortion measurement, and fusion model. Approximations and simplifications of the proposed PPCM are also presented for both V-PCC integration and low complexity. Thirdly, based on the simplified PPCM model, we propose a PWRDO scheme with Lagrange multiplier adaptation, which is incorporated into the V-PCC to enhance the coding efficiency. Experimental results show that the proposed PPCM models can be used as standalone quality metrics, and they are able to achieve higher consistency with the human subjective scores than the state-of-the-art objective visual quality metrics. Also, compared with the latest V-PCC reference model, the proposed PWRDO-based V-PCC scheme achieves an average bit rate reduction of 13.52%, 8.16%, 10.56% and 9.54%, respectively, in terms of four objective visual quality metrics for point clouds. It is significantly superior to the state-of-the-art coding algorithms. The computational complexity of the proposed PWRDO increases by 1.71% and 0.05% on average to the V-PCC encoder and decoder, respectively, which is negligible. The source codes of the PPCM and PWRDO schemes are available at https://github.com/VVCodec/PPCM-PWRDO.
- Conference Article
3
- 10.1145/3639592.3639602
- Dec 16, 2023
Dynamic point cloud enables objects or scenes to have a realistic 3D representation in motion. Storage and transmission of dynamic point cloud efficiently is an essential precondition for its application. Video-based point cloud compression (V-PCC) developed by the MPEG standardization group can achieve remarkable performance in compressing dynamic point clouds. However, it also introduces compression noise in decoded dynamic point clouds, which can significantly affect subsequent applications. In this paper, we propose a quality enhancement architecture that focuses on improving color attributes on V-PCC compressed point cloud. The architecture designs a sparse fully convolution networks using Minkowski Engine to maintain the sparsity nature of point cloud data and speed up the learning process with less memory usage. Additionally, we applied a feature extraction unit that takes into account the information across channels. Considering the influence of coordinates compression noise on our architecture and the limitation of GPU memory capacity, coordinates optimization and patch generation methods are applied to input data as a pre-processing step. To the best of our knowledge, this is the first implementation of the Minkowski Engine for enhancing color attributes of compressed point clouds in the V-PCC field. The experiment results demonstrate that the proposed architecture can improve the quality of color attributes in the reconstructed point cloud with different quantization parameters.
- Conference Article
9
- 10.1109/iscas51556.2021.9401619
- May 1, 2021
The video-based point cloud compression (V-PCC) is the state-of-the-art dynamic point cloud compression technique. V-PCC projects the 3D point cloud data patch by patch to its bounding box and organizes projected patches into a video frame, making full use of the well-developed video coding tools. Despite its high efficiency, cracks easily exist in the reconstructed point cloud in various viewing angles, which seriously degrades the visual quality. In this paper, we propose an efficient method to improve the visual quality of dynamic point cloud, especially for the main view from the content provider. The relationship between patches and views is exploited, and an algorithm intelligently reserving points that may be discarded in V-PCC is proposed. According to our subjective and perceptual objective evaluation experiments, compared with V-PCC, the overall visual quality of the reconstructed point could is evidently improved. In particular, cracks are mended with our proposed method. The Bjontegaard delta bit-rate reduction of up to 3.1% is achieved with respect to Point Cloud Quality Metric (PCQM), which partially verifies the improvement of subjective quality when adopting the proposed method.
- Research Article
3
- 10.1016/j.dsp.2024.104471
- Mar 21, 2024
- Digital Signal Processing
Cracks-suppression perceptual geometry coding for dynamic point clouds
- Research Article
2
- 10.3390/electronics14071295
- Mar 25, 2025
- Electronics
As 5G technology and 3D capture techniques have been rapidly developing, there has been a remarkable increase in the demand for effectively compressing dynamic 3D point cloud data. Video-based point cloud compression (V-PCC), which is an innovative method for 3D point cloud compression, makes use of High-Efficiency Video Coding (HEVC) to carry out the compression of 3D point clouds. This is accomplished through the projection of the point clouds onto two-dimensional video frames. However, V-PCC faces significant coding complexity, particularly for dynamic 3D point clouds, which can be up to four times more complex to process than a conventional video. To address this challenge, we propose an adaptive coding unit (CU) partitioning method that integrates occupancy graphs, convolutional neural networks (CNNs), and Bayesian optimization. In this approach, the coding units (CUs) are first divided into dense regions, sparse regions, and complex composite regions by calculating the occupancy rate R of the CUs, and then an initial classification decision is made using a convolutional neural network (CNN) framework. For regions where the CNN outputs low-confidence classifications, Bayesian optimization is employed to refine the partitioning and enhance accuracy. The findings from the experiments show that the suggested method can efficiently decrease the coding complexity of V-PCC, all the while maintaining a high level of coding quality. Specifically, the average coding time of the geometric graph is reduced by 57.37%, the attribute graph by 54.43%, and the overall coding time by 54.75%. Although the BD rate slightly increases compared with that of the baseline V-PCC method, the impact on video quality is negligible. Additionally, the proposed algorithm outperforms existing methods in terms of geometric compression efficiency and computational time savings. This study’s innovation lies in combining deep learning with Bayesian optimization to deliver an efficient CU partitioning strategy for V-PCC, improving coding speed and reducing computational resource consumption, thereby advancing the practical application of V-PCC.
- Research Article
25
- 10.1109/tmm.2023.3347638
- Jan 1, 2024
- IEEE transactions on multimedia
Video-based point cloud compression (V-PCC) is a state-of-the-art moving picture experts group (MPEG) standard for point cloud compression. V-PCC can be used to compress both static and dynamic point clouds in a lossless, near lossless, or lossy way. Many objective quality metrics have been proposed for distorted point clouds. Most of these metrics are full-reference metrics that require both the original point cloud and the distorted one. However, in some real-time applications, the original point cloud is not available, and no-reference or reduced-reference quality metrics are needed. Three main challenges in the design of a reduced-reference quality metric are how to build a set of features that characterize the visual quality of the distorted point cloud, how to select the most effective features from this set, and how to map the selected features to a perceptual quality score. We address the first challenge by proposing a comprehensive set of features consisting of compression, geometry, normal, curvature, and luminance features. To deal with the second challenge, we use the least absolute shrinkage and selection operator (LASSO) method, which is a variable selection method for regression problems. Finally, we map the selected features to the mean opinion score in a nonlinear space. Although we have used only 19 features in our current implementation, our metric is flexible enough to allow any number of features, including future more effective ones. Experimental results on the Waterloo point cloud dataset version 2 (WPC2.0) and the MPEG point cloud compression dataset (M-PCCD) show that our method, namely PCQAML, outperforms state-of-the-art full-reference and reduced-reference quality metrics in terms of Pearson linear correlation coefficient, Spearman rank order correlation coefficient, Kendall's rank-order correlation coefficient, and root mean squared error.
- Research Article
36
- 10.1109/access.2020.2991478
- Jan 1, 2020
- IEEE Access
A point cloud visualizes information by placing a voxel with a color value and a position value in a three-dimensional space. Since a point cloud uses hundreds of thousands or millions of points to visualize information, a large number of bits is needed compared to existing 2D media. Therefore, it is essential to compress point data for transmission and storage. The Moving Picture Expert Group (MPEG) is developing a point cloud compression method based on 2D video that takes advantage of the benefits of coding efficiency and the wide adaption of video codecs by various industries. This compression method is called video-based point cloud compression (V-PCC). Generally, video codecs use a compression method that employs a block matching algorithm. Currently, V-PCC is conducted using 2D video codecs, which means that motion information used by V-PCC is obtained from 2D video sequences. Thus, this 2D-based motion information limits the characterization of the motion in terms of 3D-points, which is also disadvantageous to compression efficiency. In this paper, we propose a method for estimating and compensating the motion in terms of a 3D object when compressing a dynamic object point cloud using a conventional video codec. The proposed 3D motion estimation and compensation technology showed higher gain overall in terms of BD-rate and was proven to effectively compress 3D point cloud content on the basis of 3D motion.
- Research Article
12
- 10.1109/access.2021.3118806
- Jan 1, 2021
- IEEE Access
Dynamic point clouds (DPC) are new media storage formats that allow end-users to watch objects/scenes in a three-dimensional (3D) sense. It can be displayed from different angles throughout time. However, the raw size of a point cloud is huge because there can be millions of points (each containing color triplet and location triplet information) in a point cloud, and there can be multiple point clouds in a DPC. Video-based point cloud compression (V-PCC) is developed to project a 3D point cloud to 2D images: attribute, geometry, and occupancy images. After padding, the 2D images are compressed using the well-established high-efficiency video coding (HEVC). In this study, we first employ an occupancy image to propose a blocky occupancy flag (BOF), to denote the occupancy information on “a block basis”. For coding attribute and geometry images, we use a BOF to develop a fast coding unit (CU) algorithm for early termination of the CU search recursion. We also utilize the geometry images to calculate the 2D and 3D information of each pixel, for 2D/3D spatial homogeneity of the pixels to design fast CU decision. In addition, we proposed a modified rate-distortion optimization for different color components considering the picture order count (POC) structure in HEVC/V-PCC. Finally, we propose an HEVC input pixel modification method based on a BOF to reduce the unnecessary information to be coded for attribute images. Compared with the state-of-the-art fast V-PCC encoding method, the proposed work outperforms by up to 2.31% in Bjøntegaard delta bit rates (BDBR) (with very slight loss by only up to 0.38%), and improves the time saving performances by up to 7.84% for two different testing datasets.
- Conference Article
7
- 10.1109/mmsp48831.2020.9287165
- Sep 21, 2020
With the rapid development of point cloud acquisition technologies, high-quality human-shape point clouds are more and more used in VR/AR applications and in general in 3D Graphics. To achieve near-realistic quality, such content usually contains an extremely high number of points (over 0.5 million points per 3D object per frame) and associated attributes (such as color). For this reason, disposing of efficient, dedicated 3D Point Cloud Compression (3DPCC) methods becomes mandatory. This requirement is even stronger in the case of dynamic content, where the coordinates and attributes of the 3D points are evolving over time. In this paper, we propose a novel skeleton-based 3DPCC approach, dedicated to the specific case of dynamic point clouds representing humanoid avatars. The method relies on a multi-view 2D human pose estimation of 3D dynamic point clouds. By using the DensePose neural network, we first extract the body parts from projected 2D images. The obtained 2D segmentation information is back-projected and aggregated into the 3D space. This procedure makes it possible to partition the 3D point cloud into a set of 3D body parts. For each part, a 3D affine transform is estimated between every two consecutive frames and used for 3D motion compensation. The proposed approach has been integrated into the Video-based Point Cloud Compression (V-PCC) test model of MPEG. Experimental results show that the proposed method, in the particular case of body motion with small amplitudes, outperforms the V-PCC test mode in the lossy inter-coding condition by up to 83% in terms of bitrate reduction in low bit rate conditions. Meanwhile, the proposed framework holds the potential of supporting various features such as regions of interests and level of details.
- Research Article
29
- 10.1109/tmm.2021.3079698
- May 12, 2021
- IEEE Transactions on Multimedia
In video-based point cloud compression (V-PCC), a dynamic point cloud is projected onto geometry and attribute videos patch by patch for compression. In addition to the geometry and attribute videos, an occupancy map video is compressed into a V-PCC bitstream to indicate whether a two-dimensional (2D) point in the projected geometry video corresponds to any point in three-dimensional (3D) space. The occupancy map video is usually downsampled before compression to obtain a tradeoff between the bitrate and the reconstructed point cloud quality. Due to the accuracy loss in the downsampling process, some noisy points are generated, which leads to severe objective and subjective quality degradation of the reconstructed point cloud. To improve the quality of the reconstructed point cloud, we propose using a convolutional neural network (CNN) to improve the accuracy of the occupancy map video. We mainly make the following contributions. First, we improve the accuracy of the occupancy map video by formulating the problem as a binary segmentation problem since the pixel values of the occupancy map video are either 0 or 1. Second, in addition to the downsampled occupancy map video, we introduce a reconstructed geometry video as the other input of the CNN to provide more useful information in order to indicate the occupancy map video. To the best of our knowledge, this is the first learning-based work to improve the performance of V-PCC. Compared to state-of-the-art schemes, our proposed CNN-based approach achieves much more accurate occupancy map videos and significant bitrate savings.
- Research Article
8
- 10.1109/ojsp.2022.3160392
- Jan 1, 2022
- IEEE Open Journal of Signal Processing
The dynamic point cloud is widely needed in 3D vision related applications such as virtual reality and telepresence. Due to the huge amount of data, a key technology before the effective application is the dynamic point cloud compression. The state-of-the-art dynamic point cloud compression scheme, video-based point cloud compression (V-PCC), generates 2D videos with some uncorrelation due to the patch segmentation and packing process, which will affect the compression efficiency. In this paper, we propose a Packing with Patch Correlation Improvement (PPCI) algorithm to adaptively remove the uncorrelated parts between matched patches in packing for the sake of inter-prediction performance. We first propose a basic unidirectional patch re-segmentation operator to remove the uncorrelated parts of the patches in the current point cloud relative to the patches in its reference point cloud. The removed parts will be formed as new patches and added to the patch collection of the current point cloud. Then we propose a back-and-forth structure, which is a combination of several basic patch re-segmentation operators, to bilaterally remove the uncorrelated parts of matched patches in a back-and-forth (BF) unit. Furthermore, we propose a framework to adaptively decide the best length of each BF unit in a point cloud sequence. Experimental results show that our method achieves noticeable bitrate savings compared with the existing V-PCC packing methods, particularly for sequences with small motion.
- Conference Article
- 10.1109/bmsb47279.2019.8971855
- Jun 1, 2019
Owing to the characteristics of exquisite and efficient presentation, 3D point cloud has attracted significant attentions. Given that dynamic point cloud consists of geometry and texture information, and they often possess different importance on the visual quality. In this paper, we propose a rate allocation scheme for Unequal Error Protection (UEP) in dynamic point cloud. The proposed algorithm is based on the Video-based Point Cloud Compression (V-PCC) standard, which processes point cloud into 2D frame sequences. We allocate the code rate according to different contributions of geometry and texture content to the visual quality, as well as the frame information. The objective evaluation result shows that the proposed UEP algorithm offers quality improvement compared with the Equal Error Protection (EEP) scheme, which is also confirmed by the subjective quality assessment.
- Research Article
1
- 10.1145/3690641
- Nov 18, 2024
- ACM Transactions on Multimedia Computing, Communications, and Applications
Point cloud (PC) compression is crucial to immersive visual applications such as autonomous vehicles to classify objects on the roads. The Motion Picture Experts Group (MPEG) standardization group has achieved a notable compression efficiency, called video-based PC compression (V-PCC), which consists of an encoder-decoder. The V-PCC encoder takes original 3D PC data and projects them onto multiple 2D planes to generate several 2D feature images. These images are then compressed using the well-established High-Efficiency Video Coding (HEVC) method. The V-PCC decoder uses compressed information and decoding techniques to reconstruct the 3D PC. However, the PCs produced by V-PCC are often sparse, non-uniform, and contain artifacts. In many practical applications, it is necessary to recover complete PCs from partial ones in real time. This article presents a method for enhancing decoded PCs as a post-processing step in the V-PCC with reduced computational time. Our approach involves a 2D upsampling for the V-PCC occupancy image, which increases the density of the PC, and a 2D high-resolution auxiliary information modification algorithm for the 2D-3D conversion of high-resolution 3D PCs, which improves the uniformity and reduces the noise in the PC. The 3D high-resolution PC has been further enhanced using the developed 3D outlier removal and point regeneration algorithm. Our proposed work can significantly simplify the state-of-the-art super resolution methods for PCs and reduce the time complexity of 61–75% while maintaining a high level of quality in PCs.