Towards neural network approaches for point cloud compression
Point cloud imaging has emerged as an efficient and popular solution to represent immersive visual information. However, the large volume of data generated in the acquisition process reveals the need of efficient compression solutions in order to store and transmit such contents. Several standardization committees are in the process of finalizing efficient compression schemes to cope with the large volume of information that point clouds require. At the same time, recent efforts on learning-based compression approaches have been shown to exhibit good performance in the coding of conventional image and video contents. It is currently an open question how learning-based coding performs when applied to point cloud data. In this study, we extend recent efforts on the matter by exploring neural network implementations for separate, or joint compression of geometric and textural information from point cloud contents. Two alternative architectures are presented and compared; that is, a unified model that learns to encode point clouds in a holistic way, allowing fine-tuning for quality preservation per attribute, and a second paradigm consisting of two cascading networks that are trained separately to encode geometry and color, individually. A baseline configuration from the best-performing option is compared to the MPEG anchor, showing better performance for geometry and competitive performance for color encoding at low bit-rates. Moreover, the impact of a series of parameters is examined on the network performance, such as the selection of input block resolution for training and testing, the color space, and the loss functions. Results provide guidelines for future efforts in learning-based point cloud compression.
- Research Article
11
- 10.1109/access.2020.3038800
- Jan 1, 2020
- IEEE Access
Point cloud content is widely used to store and represent 3D volumetric objects with a complex and detailed representation from any direction of view. However, the amount of data needed for point cloud content is much larger than that of 2D representations. To overcome this difficulty, MPEG has started to develop a Video-based Point Cloud Compression (V-PCC) that is designed by projecting point cloud content into 2D content and compressing the 2D content using conventional 2D video codecs. Compression efficiency of the V-PCC can be achieved when 3D motion flow and textual conformity on 3D surfaces are preserved through 2D projections that are favorable to the 2D video codec. As mentioned above, point cloud content has a complex geometry, therefore when decomposing a point from 3D coordinates to construct a 2D patch, several situations must be considered in addition to the location of adjacent points. This paper addresses the issues in such complex geometry by proposing a method that preserves 3D homogeneity in 2D patches. Comprehensive experiments are conducted to demonstrate bitrate savings of 0.5%, 0.6%, 7.8%, 7.0% and 5.5% in random access mode and 0.1%, 0.0%, 7.0%, 4.2% and 3.3% in all intra mode for D1, D2, Y, Cb, and Cr, respectively, compared to the reference software.
- Conference Article
64
- 10.1109/qomex48832.2020.9123076
- May 1, 2020
The proliferation of devices such as mobile phones, virtual reality headsets, and head-mounted displays has increased the popularity of immersive applications that deliver realistic representations of the real world. Among the technologies that enable such applications, the point cloud (PC) technology seems to be one of the most mature alternatives, gaining prominence in academia, industry, and standardization committees. Although PC technologies have been used in entertainment, automotive, and geographical location industries, the design of objective quality assessment methods for PC contents is still an open problem. In this paper, we introduce a texture-based objective quality assessment method for PC contents. The method analyzes the texture of the PC content using the Local Binary Pattern (LBP) descriptor. Unlike points in still (2D) images, the points in a PC are not equally distributed in space. Therefore, we adapted the LBP descriptor to allow processing a PC point and its neighboring points. The statistics of the LBP outputs, for both reference and test PCs, are computed and compared to obtain a quality estimate for the test (impaired) PC content. Experimental results show that the proposed PC quality metric has a good correlation with subjective quality scores, outperforming state-of-the-art PC quality metrics.
- Conference Article
60
- 10.1109/vcip.2017.8305131
- Dec 1, 2017
3D sensing and content capture have made significant progress in recent years and the MPEG standardization organization is launching a new project on immersive media with point cloud compression (PCC) as one key corner stone. In this work, we introduce a new binary tree based point cloud content partition and explore the graph signal processing tools, especially the graph transform with optimized Laplacian sparsity, to achieve better energy compaction and compression efficiency. The resulting rate-distortion operating points are convex-hull optimized over the existing Lagrangian solutions. Simulation results with the latest high quality point cloud content captured from the MPEG PCC demonstrated the transform efficiency and rate-distortion (R-D) optimal potential of the proposed solutions.
- Research Article
25
- 10.1109/tmm.2022.3154927
- Jan 1, 2023
- IEEE Transactions on Multimedia
With the growth of Extended Reality (XR) and capturing devices, point cloud representation has become attractive to academics and industry. Point Cloud Compression (PCC) algorithms further promote numerous XR applications that may change our daily life. However, in the literature, PCC algorithms are often evaluated with heterogeneous datasets, metrics, and parameters, making the results hard to interpret. In this article, we propose an open-source benchmark platform called PCC Arena. Our platform is modularized in three aspects: PCC algorithms, point cloud datasets, and performance metrics. Users can easily extend PCC Arena in each aspect to fulfill the requirements of their experiments. To show the effectiveness of PCC Arena, we integrate seven PCC algorithms into PCC Arena along with six point cloud datasets. We then compare the algorithms on ten carefully selected metrics to evaluate the quality of the output point clouds. We further conduct a user study to quantify the user-perceived quality of rendered images that are produced by different PCC algorithms. Several novel insights are revealed in our comparison: (i) Signal Processing (SP)-based PCC algorithms are stable for different usage scenarios, but the trade-offs between coding efficiency and quality should be carefully addressed, (ii) Neural Network (NN)-based PCC algorithms have the potential to consume lower bitrates yet provide similar results to SP-based algorithms, (iii) NN-based PCC algorithms may generate artifacts and suffer from long running time, and (iv) NN-based PCC algorithms are worth more in-depth studies as the recently proposed NN-based PCC algorithms improve the quality and running time. We believe that PCC Arena can play an essential role in allowing engineers and researchers to better interpret and compare the performance of future PCC algorithms.
- Conference Article
67
- 10.1109/qomex48832.2020.9123121
- May 1, 2020
In this study, we explore the use of virtual reality to subjectively evaluate the visual quality of point cloud contents. To this aim, we develop the PointXR toolbox, a set of Unity applications that can host experiments under variants of interactive and passive evaluation protocols. An auxiliary tool to facilitate the configuration of the supported rendering schemes for point cloud visualization is provided as part of it. Our toolbox is employed to conduct two validating experiments in a virtual environment with 6 degrees of freedom. The purpose is to assess the performance of color encoders that are incorporated in the upcoming MPEG standard on point cloud compression. For this study, we convert a set of mesh models to point cloud contents, and form a high-quality cultural heritage repository, namely, PointXR dataset. A comparison between the adopted protocols and the codecs' performance is carried based on the ratings obtained from both experiments. Finally, interactivity patterns based on behavioral data that were recorded during the evaluations are extracted, and results are discussed. The PointXR toolbox, the PointXR dataset, and the experimental results are made publicly available.
- Conference Article
1
- 10.1117/12.2526702
- Sep 6, 2019
The point cloud is a medium that visualizes various information by placing a point having a color value and a geometry value in a three-dimensional space. The point cloud uses dozens and millions of points for visualization of information, and the key point of commercialization of this point cloud video is to efficiently compress a large amount of information of point cloud and transmit it to users. Currently, MPEG V-PCC is conducting dynamic point cloud compression research using the 2D video codec, where motion estimation is conducted in terms of 2D video sequences. Thus, there is a limitation in estimating the motion in 3D point cloud contents. In this paper, we propose the method to use the 3D motion for point cloud video compression. The proposed technology achieves efficient compression rate and improves accuracy in lossy compression.
- Conference Article
7
- 10.1109/mmsp48831.2020.9287165
- Sep 21, 2020
With the rapid development of point cloud acquisition technologies, high-quality human-shape point clouds are more and more used in VR/AR applications and in general in 3D Graphics. To achieve near-realistic quality, such content usually contains an extremely high number of points (over 0.5 million points per 3D object per frame) and associated attributes (such as color). For this reason, disposing of efficient, dedicated 3D Point Cloud Compression (3DPCC) methods becomes mandatory. This requirement is even stronger in the case of dynamic content, where the coordinates and attributes of the 3D points are evolving over time. In this paper, we propose a novel skeleton-based 3DPCC approach, dedicated to the specific case of dynamic point clouds representing humanoid avatars. The method relies on a multi-view 2D human pose estimation of 3D dynamic point clouds. By using the DensePose neural network, we first extract the body parts from projected 2D images. The obtained 2D segmentation information is back-projected and aggregated into the 3D space. This procedure makes it possible to partition the 3D point cloud into a set of 3D body parts. For each part, a 3D affine transform is estimated between every two consecutive frames and used for 3D motion compensation. The proposed approach has been integrated into the Video-based Point Cloud Compression (V-PCC) test model of MPEG. Experimental results show that the proposed method, in the particular case of body motion with small amplitudes, outperforms the V-PCC test mode in the lossy inter-coding condition by up to 83% in terms of bitrate reduction in low bit rate conditions. Meanwhile, the proposed framework holds the potential of supporting various features such as regions of interests and level of details.
- Research Article
3
- 10.13052/jwe1540-9589.2232
- Jul 3, 2023
- Journal of Web Engineering
This paper proposes a point cloud (PC) visual quality assessment (VQA) framework that reflects the human visual system (HVS). The proposed framework compares natural images acquired using a digital camera and PC images generated via 2D projection in terms of appropriate objective quality evaluation metrics. Humans primarily consume natural images; thus, human knowledge is typically formed from natural images. Thus, natural images can be more reliable reference data than PC data. The proposed framework performs an image alignment process based on feature matching and image warping to use the natural images as a reference which enhances the similarities of the acquired natural and corresponding PC images. The framework facilitates identifying which objective VQA metrics can be used to reflect the HVS effectively. We constructed a database of natural images and three PC image qualities, and objective and subjective VQAs were conducted. The experimental result demonstrates that the acceptable consistency among different PC qualities appears in the metrics that compare the global structural similarity of images. We found that the SSIM, MAD, and GMSD achieved remarkable Spearman rank-order correlation coefficient scores of 0.882, 0.871, and 0.930, respectively. Thus, the proposed framework can reflect the HVS by comparing the global structural similarity between PC and natural reference images.
- Research Article
1
- 10.1109/tgrs.2025.3573206
- Jan 1, 2025
- IEEE Transactions on Geoscience and Remote Sensing
LiDAR point cloud (LPC) compression is an indispensable component for 3D vision tasks, especially for dynamic point clouds. However, the existing methods based on traditional spatial-temporal attention are immature, causing little improvement in inter-frame feature extraction. In this paper, we propose Diverse Attention-based Point Cloud Compression (DAPCC), an LPC compression entropy model combining aggregation embedding modules for temporal point matching and spatial-temporal attention blocks for dynamic Octree node encoding, which can effectively utilize the change information of dynamic point clouds. Specifically, we first introduce aggregation embedding to match the Octree sequences from two sweeps to establish temporal correlation. To effectively capture the feature details, we further design local and global combined attention for the spatial-temporal information of point clouds which can focus on the whole context. Finally, we organize a symmetric MLP module capable of strengthening vital features. We conduct experiments of static and dynamic compression on both indoor/outdoor point cloud benchmark datasets (<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i>, ScanNet, SemanticKITTI, and MPEG Common Test Conditions (CTC) Category 3 datasets) and downstream applications (<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i>, vehicle detection and semantic segmentation). Compared with the previous state-of-the-art methods, our method achieves up to 14.7% bpp and 45% decoding time savings and adapts to the downstream tasks with almost no impact on performance.
- Research Article
15
- 10.1109/access.2022.3148252
- Jan 1, 2022
- IEEE Access
A point cloud acquired through a Light Detection And Ranging (LiDAR) sensor can be illustrated as a continuous frame with a time axis. Since the frame-by-frame point cloud has a high correlation between frames, a higher compression efficiency can be obtained by using an inter-prediction scheme, and for this purpose, Geometry-based Point Cloud Compression (G-PCC) in the Moving Picture Expert Group (MPEG) opened Inter-Exploratory Model (Inter-EM) which experiments on continuous LiDAR based point cloud frames compression through inter-prediction. The points of the LiDAR based point cloud have two different types of motion: global motion brought about by a vehicle with a LiDAR sensor and local motion generated by an object e.g., a walking person. Thus, Inter-EM consists of a compression structure in terms of both global and local motion, and the Inter-EM’s global motion compensation technology increases the compression efficiency via a single matrix describing the global motion of points. However, this is difficult to predict with a single matrix, which causes imprecise global motion estimation since the objects in a LiDAR-based point cloud show different global motion estimates according to object characteristics such as shape and position. Therefore, this paper proposes a global motion prediction and compensation scheme that considers the characteristics of objects for efficient compression of LiDAR-based point cloud frames. The proposed global motion prediction and compensation scheme achieved maximum gain of −22.0% and average of −9.4% in terms of the Bjontegaard-Delta-rate (BD-rate), and effectively compressed the LiDAR-based sparse point cloud.
- Research Article
6
- 10.3390/s25061660
- Mar 7, 2025
- Sensors (Basel, Switzerland)
This meta-survey provides a comprehensive review of 3D point cloud (PC) applications in remote sensing (RS), essential datasets available for research and development purposes, and state-of-the-art point cloud compression methods. It offers a comprehensive exploration of the diverse applications of point clouds in remote sensing, including specialized tasks within the field, precision agriculture-focused applications, and broader general uses. Furthermore, datasets that are commonly used in remote-sensing-related research and development tasks are surveyed, including urban, outdoor, and indoor environment datasets; vehicle-related datasets; object datasets; agriculture-related datasets; and other more specialized datasets. Due to their importance in practical applications, this article also surveys point cloud compression technologies from widely used tree- and projection-based methods to more recent deep learning (DL)-based technologies. This study synthesizes insights from previous reviews and original research to identify emerging trends, challenges, and opportunities, serving as a valuable resource for advancing the use of point clouds in remote sensing.
- Research Article
13
- 10.1186/s13640-024-00626-3
- Aug 9, 2024
- EURASIP Journal on Image and Video Processing
Point clouds denote a prominent solution for the representation of 3D photo-realistic content in immersive applications. Similarly to other imaging modalities, quality predictions for point cloud contents are vital for a wide range of applications, enabling trade-off optimizations between data quality and data size in every processing step from acquisition to rendering. In this work, we focus on use cases that consider human end-users consuming point cloud contents and, hence, we concentrate on visual quality metrics. In particular, we propose a set of perceptually relevant descriptors based on principal component analysis (PCA) decomposition, which is applied to both geometry and texture data for full-reference point cloud quality assessment. Statistical features are derived from these descriptors to characterize local shape and appearance properties for both a reference and a distorted point cloud. The extracted statistical features are subsequently compared to provide corresponding predictions of visual quality for the distorted point cloud. As part of our method, a learning-based approach is proposed to fuse these individual predictors to a unified perceptual score. We validate the accuracy of the individual predictors, as well as the unified quality scores obtained after regression against subjectively annotated datasets, showing that our metric outperforms state-of-the-art solutions. Insights regarding design decisions are provided through exploratory studies, evaluating the performance of our metric under different parameter configurations, attribute domains, color spaces, and regression models. A software implementation of the proposed metric is made available at the following link: https://github.com/cwi-dis/pointpca.
- Conference Article
6
- 10.1109/vcip56404.2022.10008821
- Dec 13, 2022
With the increased popularity of immersive media, point clouds have become one of the popular data representations for presenting 3D scenes. The huge amount of point cloud data poses a great challenge on their storage and real-time transmission, which calls for efficient point cloud compression. This paper presents a novel point cloud geometry compression technique based on learning end-to-end an augmented normalizing flow (ANF) model to represent the occupancy status of voxelized data points. The higher expressive power of ANF than variational autoencoders (V AE) is leveraged for the first time to represent binary occupancy status. Compared to two coding standards developed by MPEG, namely G-PCC (geometry-based point cloud compression) and V-PCC (video-based point cloud compression), our method achieves more than 80% and 30% bitrate reduction, respectively. Compared to several learning-based methods, our method also yields better performance.
- Conference Article
37
- 10.1145/3469877.3490611
- Dec 1, 2021
The ever-increasing 3D application makes the point cloud compression unprecedentedly important and needed. In this paper, we propose a patch-based compression process using deep learning, focusing on the lossy point cloud geometry compression. Unlike existing point cloud compression networks, which apply feature extraction and reconstruction on the entire point cloud, we divide the point cloud into patches and compress each patch independently. In the decoding process, we finally assemble the decompressed patches into a complete point cloud. In addition, we train our network by a patch-to-patch criterion, i.e., use the local reconstruction loss for optimization, to approximate the global reconstruction optimality. Our method outperforms the state-of-the-art in terms of rate-distortion performance, especially at low bitrates. Moreover, the compression process we proposed can guarantee to generate the same number of points as the input. The network model of this method can be easily applied to other point cloud reconstruction problems, such as upsampling.
- Conference Article
9
- 10.1109/dcc50243.2021.00085
- Mar 1, 2021
3D point cloud has been widely applied in virtual reality and augmented reality. A complex 3D scene always needs a large number of the point cloud to represent and demands a lot of space to store. Thus, point cloud compression becomes a crucial issue to research. In this paper, we propose a novel lossy geometric compression method of autoencoder based on DCGAN optimization. This method can reconstruct a high-quality point cloud and solves a large area of missing points in the process of compression and decompression. To improve the point cloud codec performance, we propose a multi-scale 3D deconvolution hopping connection structure to obtain a better-quality reconstructed point cloud under low bit rates. Our approach is the first GAN-based point cloud compression algorithm to our knowledge. Compared with state-of-the-art methods on the MVUB dataset, our approach achieves a better rate-distortion performance and visual quality.