Learning-Based Point Cloud Decoding with Independent and Scalable Reduced Complexity

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Point Clouds (PCs) have gained significant attention due to their usage in diverse application domains, notably virtual and augmented reality. While PCs excel in providing detailed 3D visualization, this typically requires millions of points which must be efficiently coded for real-world deployment, notably storage and streaming. Recently, learning-based coding solutions have been adopted, notably in the JPEG Pleno Point Coding (PCC) standard, which uses a coding model with millions of model parameters. This requires the use of high-performance computing devices, which may not be available, notably at the decoder side. In this context, this paper proposes two reduced complexity decoding solutions, based on the adoption of scalability principles, to decode the same JPEG PCC compliant bitstreams. The design of these solutions is based on two innovative model pruning strategies which reduce the decoding complexity. The experimental results demonstrate the effective capability to significantly reduce the number of decoding model parameters with an acceptable penalty on Rate-Distortion (RD) performance compared to the full complexity model.

Similar Papers
  • Conference Article
  • Cite Count Icon 6
  • 10.1109/euvip53989.2022.9922784
Double-Deep Learning-Based Point Cloud Geometry Coding with Adaptive Super-Resolution
  • Sep 11, 2022
  • Manuel Ruivo + 2 more

Point clouds represent 3D visual data in a very immersive and realistic way, offering to the users a large degree of navigation and interaction. For some key use cases, point clouds are potentially lighter and easier to acquire than other 3D representation models, thus offering an alternative with lower computational cost. To offer visual realistic and immersive experiences, notably the illusion of well-formed surfaces, point clouds typically require a large number of points. To make its storage and transmission feasible, efficient point cloud coding is essential. Recently, deep learning-based point cloud coding solutions have proven to be competitive in compression performance, excelling in distinct scenarios, although struggling to achieve similar results for sparser point clouds and lower coding rates. To tackle these limitations, this paper proposes a double-deep learning-based approach for point cloud coding by integrating a super-resolution tool. The main idea consists on converting sparser point clouds into denser ones via a down-sampling step performed before coding. Since this is a lossy process, a super-resolution step is included after decoding to mitigate the point losses and bringing the point cloud to the initial resolution. Furthermore, the sampling factor can be adaptively selected, thus offering additional flexibility to the point cloud characteristics. The proposed double-deep coding and super-resolution solution outperforms both the G-PCC Octree and V-PCC Intra point cloud coding standards achieving, respectively, 81.9% and 22.3% rate reduction measured as BD-Rate for the PSNR D1 metric.

  • Conference Article
  • Cite Count Icon 15
  • 10.1109/icassp49357.2023.10096294
NF-PCAC: Normalizing Flow Based Point Cloud Attribute Compression
  • Jun 4, 2023
  • Rodrigo B Pinheiro + 3 more

Learning-based point cloud (PC) compression is a promising research avenue to reduce the transmission and storage costs for PC applications. Existing learning-based methods to compress PCs have mainly focused on geometry and employ variational autoencoders to learn compact signal representations. However, autoencoders leverage low-dimensional bottlenecks that limit the maximum reconstruction quality, even at high bitrates. In this paper, we propose a different and novel approach to compress PC attributes by using normalizing flows. Since normalizing flows model invertible transforms, the proposed approach can achieve better reconstruction quality than variational autoencoders over a large range of bitrates. Our Normalizing Flow-based Point Cloud Attribute Compression (NF-PCAC) outperforms previous learning-based methods for attribute compression, and has comparable performance as G-PCC v.14, showing the potential of this scheme for PC compression.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 11
  • 10.3390/rs13234917
Point Projection Network: A Multi-View-Based Point Completion Network with Encoder-Decoder Architecture
  • Dec 3, 2021
  • Remote Sensing
  • Weichao Wu + 4 more

Recently, unstructured 3D point clouds have been widely used in remote sensing application. However, inevitable is the appearance of an incomplete point cloud, primarily due to the angle of view and blocking limitations. Therefore, point cloud completion is an urgent problem in point cloud data applications. Most existing deep learning methods first generate rough frameworks through the global characteristics of incomplete point clouds, and then generate complete point clouds by refining the framework. However, such point clouds are undesirably biased toward average existing objects, meaning that the completion results lack local details. Thus, we propose a multi-view-based shape-preserving point completion network with an encoder–decoder architecture, termed a point projection network (PP-Net). PP-Net completes and optimizes the defective point cloud in a projection-to-shape manner in two stages. First, a new feature point extraction method is applied to the projection of a point cloud, to extract feature points in multiple directions. Second, more realistic complete point clouds with finer profiles are yielded by encoding and decoding the feature points from the first stage. Meanwhile, the projection loss in multiple directions and adversarial loss are combined to optimize the model parameters. Qualitative and quantitative experiments on the ShapeNet dataset indicate that our method achieves good results in learning-based point cloud shape completion methods in terms of chamfer distance (CD) error. Furthermore, PP-Net is robust to the deletion of multiple parts and different levels of incomplete data.

  • Research Article
  • Cite Count Icon 19
  • 10.1016/j.autcon.2024.105473
Automated BIM-to-scan point cloud semantic segmentation using a domain adaptation network with hybrid attention and whitening (DawNet)
  • May 18, 2024
  • Automation in Construction
  • Difeng Hu + 2 more

Automated BIM-to-scan point cloud semantic segmentation using a domain adaptation network with hybrid attention and whitening (DawNet)

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/ism55400.2022.00016
Impact of Conventional and Deep Learning-based Point Cloud Geometry Coding on Deep Learning-based Classification Performance
  • Dec 1, 2022
  • Abdelrahman Seleem + 3 more

Deep learning (DL)-based point cloud (PC) classification is a key computer vision task for many applications, notably autonomous driving, surveillance, and cultural heritage. In many application scenarios, PCs must be coded to reach practical rates for storage and transmission purposes, and thus they suffer from more or less intense compression artifacts. After the specification of two MPEG PC coding standards, DL-based PC coding has gained momentum, reaching competitive compression performance, especially for dense PCs. Since using decoded PCs, which may suffer from compression artifacts, may impact the final classification performance, the main goal of this paper is to study the impact of static PC geometry coding on DL-based classification. This study is performed on the ModelNet40 test dataset using the conventional G-PCC coding standard and the DL-based PC geometry codec which was the top performing solution responding to the recent JPEG Pleno PC Coding Call for Proposals. Two highly performing DL-based classifiers are used, considering the original PC geometry before and after voxelization, as well as the decoded PC geometry for different rates and qualities. As expected, coding has an impact on the classification performance, especially for the lower rates/qualities. For very sparse PCs, conventional coding still has advantage, contrarily to dense PCs, but this should change in the future with DL-based tools becoming the most natural solutions for both PC geometry coding and classification.

  • Research Article
  • Cite Count Icon 1
  • 10.3390/electronics11193157
Sparse 3D Point Cloud Parallel Multi-Scale Feature Extraction and Dense Reconstruction with Multi-Headed Attentional Upsampling
  • Oct 1, 2022
  • Electronics
  • Meng Wu + 2 more

Three-dimensional (3D) point clouds have a wide range of applications in the field of 3D vision. The quality of the acquired point cloud data considerably impacts the subsequent work of point cloud processing. Due to the sparsity and irregularity of point cloud data, processing point cloud data has always been challenging. However, existing deep learning-based point cloud dense reconstruction methods suffer from excessive smoothing of reconstruction results and too many outliers. The reason for this is that it is not possible to extract features for local and global features at different scales and provide different levels of attention to different regions in order to obtain long-distance dependence for dense reconstruction. In this paper, we use a parallel multi-scale feature extraction module based on graph convolution and an upsampling method with an added multi-head attention mechanism to process sparse and irregular point cloud data to obtain extended point clouds. Specifically, a point cloud training patch with 256 points is inputted. The PMS module uses three residual connections in the multi-scale feature extraction stage. Each PMS module consists of three parallel DenseGCN modules with different size convolution kernels and different averaging pooling sizes. The local and global feature information of the augmented receptive field is extracted efficiently. The scale information is obtained by averaging the different pooled augmented receptive fields. The scale information was obtained using the different average pooled augmented receptive fields. The upsampling stage uses an upsampling rate of r=4, The self-attentive features with a different focus on different point cloud data regions obtained by fusing different weights make the feature representation more diverse. This operation avoids the bias of one attention, and each focuses on extracting valuable fine-grained feature information. Finally, the coordinate reconstruction module obtains 1024 dense point cloud data. Experiments show that the proposed method demonstrates good evaluation metrics and performance and is able to obtain better visual quality. The problems of over-smoothing and excessive outliers are effectively mitigated, and the obtained sparse point cloud is more dense.

  • Research Article
  • Cite Count Icon 21
  • 10.1109/access.2020.2973003
Sparse-to-Dense Multi-Encoder Shape Completion of Unstructured Point Cloud
  • Jan 1, 2020
  • IEEE Access
  • Yanjun Peng + 6 more

Unstructured point clouds are a representative shape representation of real-world scenes in 3D vision and graphics. Incompletion inevitably arises, due to the way the set of unorganized points is captured, e.g., as fusion of depth images, merged laser scans, or structure-from-x. In this paper, an end-to-end sparse-to-dense multi-encoder neural network (termed an SDME-Net) is proposed for uniformly completing an unstructured point cloud with its shape details preserved. Unlike most existing learning-based shape completion methods that are enforced on the representations of 2D images and 3D voxelization of point clouds, and require priors of the underlying shape's structures, topologies and annotations, the SDME-Net is implemented on the incomplete and even noisy point cloud without any transformation, and makes no specific assumptions about the incompletion distribution and geometry features in the input. Specifically, the defective point cloud is completed and optimized in a sparse-to-dense manner of two-stages. In the first stage, we generate a sparse but complete point cloud based on a bistratal PointNet, and in the second stage, we yield a dense and high-fidelity point cloud by encoding and decoding the sparse result in the first stage using PointNet++. Meanwhile, we combine the distance loss and repulsion loss to generate more uniformly distributed output point clouds closer to the ground-truth counterparts. Qualitative and quantitative experiments on the public ShapeNet dataset illustrate that our approach outperforms the state-of-art learning-based point cloud shape completion methods in terms of real structure recovery, uniformity, and noise/partiality robustness.

  • Conference Article
  • Cite Count Icon 21
  • 10.1109/mmsp48831.2020.9287060
Deep Learning-based Point Cloud Geometry Coding with Resolution Scalability
  • Sep 21, 2020
  • Andre F R Guarda + 2 more

Point clouds are a 3D visual representation format that has recently become fundamentally important for immersive and interactive multimedia applications. Considering the high number of points of practically relevant point clouds, and their increasing market demand, efficient point cloud coding has become a vital research topic. In addition, scalability is an important feature for point cloud coding, especially for real-time applications, where the fast and rate efficient access to a decoded point cloud is important; however, this issue is still rather unexplored in the literature. In this context, this paper proposes a novel deep learning-based point cloud geometry coding solution with resolution scalability via interlaced sub-sampling. As additional layers are decoded, the number of points in the reconstructed point cloud increases as well as the overall quality. Experimental results show that the proposed scalable point cloud geometry coding solution outperforms the recent MPEG Geometry-based Point Cloud Compression standard which is much less scalable.

  • Conference Article
  • 10.1109/robio49542.2019.8961496
From Virtuality To Reality: A Learning-based Point Cloud Labeling Method With Synthesis Scene
  • Dec 1, 2019
  • Runjian Chen + 4 more

This paper proposes a machine learning based point cloud labeling algorithm. To classify point cloud in a sparse scan of both virtual and real scene as basic geometrical elements like planar and edge, a rendering dataset in virtual environment is created and labeled. Then the principal component analysis (PCA) is applied to calculate local geometrical features of point cloud. An in-depth analysis is performed by training several machine learning models with PCA features and experiments in which the trained models are applied to on both rendering point cloud and laser scan of real scene are conducted to validate that our approach is scale-invariant and effective on both rendering point cloud and point cloud of real scene.

  • Research Article
  • Cite Count Icon 13
  • 10.1016/j.isprsjprs.2024.03.010
Plant-Denoising-Net (PDN): A plant point cloud denoising network based on density gradient field learning
  • Apr 1, 2024
  • ISPRS Journal of Photogrammetry and Remote Sensing
  • Jianeng Wu + 4 more

Plant-Denoising-Net (PDN): A plant point cloud denoising network based on density gradient field learning

  • Research Article
  • Cite Count Icon 10
  • 10.3390/s22166210
An Efficient Ensemble Deep Learning Approach for Semantic Point Cloud Segmentation Based on 3D Geometric Features and Range Images
  • Aug 18, 2022
  • Sensors (Basel, Switzerland)
  • Muhammed Enes Atik + 1 more

Mobile light detection and ranging (LiDAR) sensor point clouds are used in many fields such as road network management, architecture and urban planning, and 3D High Definition (HD) city maps for autonomous vehicles. Semantic segmentation of mobile point clouds is critical for these tasks. In this study, we present a robust and effective deep learning-based point cloud semantic segmentation method. Semantic segmentation is applied to range images produced from point cloud with spherical projection. Irregular 3D mobile point clouds are transformed into regular form by projecting the clouds onto the plane to generate 2D representation of the point cloud. This representation is fed to the proposed network that produces semantic segmentation. The local geometric feature vector is calculated for each point. Optimum parameter experiments were also performed to obtain the best results for semantic segmentation. The proposed technique, called SegUNet3D, is an ensemble approach based on the combination of U-Net and SegNet algorithms. SegUNet3D algorithm has been compared with five different segmentation algorithms on two challenging datasets. SemanticPOSS dataset includes the urban area, whereas RELLIS-3D includes the off-road environment. As a result of the study, it was demonstrated that the proposed approach is superior to other methods in terms of mean Intersection over Union (mIoU) in both datasets. The proposed method was able to improve the mIoU metric by up to 15.9% in the SemanticPOSS dataset and up to 5.4% in the RELLIS-3D dataset.

  • Conference Article
  • Cite Count Icon 44
  • 10.1117/12.2569115
Towards neural network approaches for point cloud compression
  • Aug 21, 2020
  • Evangelos Alexiou + 2 more

Point cloud imaging has emerged as an efficient and popular solution to represent immersive visual information. However, the large volume of data generated in the acquisition process reveals the need of efficient compression solutions in order to store and transmit such contents. Several standardization committees are in the process of finalizing efficient compression schemes to cope with the large volume of information that point clouds require. At the same time, recent efforts on learning-based compression approaches have been shown to exhibit good performance in the coding of conventional image and video contents. It is currently an open question how learning-based coding performs when applied to point cloud data. In this study, we extend recent efforts on the matter by exploring neural network implementations for separate, or joint compression of geometric and textural information from point cloud contents. Two alternative architectures are presented and compared; that is, a unified model that learns to encode point clouds in a holistic way, allowing fine-tuning for quality preservation per attribute, and a second paradigm consisting of two cascading networks that are trained separately to encode geometry and color, individually. A baseline configuration from the best-performing option is compared to the MPEG anchor, showing better performance for geometry and competitive performance for color encoding at low bit-rates. Moreover, the impact of a series of parameters is examined on the network performance, such as the selection of input block resolution for training and testing, the color space, and the loss functions. Results provide guidelines for future efforts in learning-based point cloud compression.

  • Research Article
  • Cite Count Icon 17
  • 10.1145/3550454.3555481
3QNet
  • Nov 30, 2022
  • ACM Transactions on Graphics
  • Tianxin Huang + 7 more

Since the development of 3D applications, the point cloud, as a spatial description easily acquired by sensors, has been widely used in multiple areas such as SLAM and 3D reconstruction. Point Cloud Compression (PCC) has also attracted more attention as a primary step before point cloud transferring and saving, where the geometry compression is an important component of PCC to compress the points geometrical structures. However, existing non-learning-based geometry compression methods are often limited by manually pre-defined compression rules. Though learning-based compression methods can significantly improve the algorithm performances by learning compression rules from data, they still have some defects. Voxel-based compression networks introduce precision errors due to the voxelized operations, while point-based methods may have relatively weak robustness and are mainly designed for sparse point clouds. In this work, we propose a novel learning-based point cloud compression framework named 3D Point Cloud Geometry Quantiation Compression Network (3QNet), which overcomes the robustness limitation of existing point-based methods and can handle dense points. By learning a codebook including common structural features from simple and sparse shapes, 3QNet can efficiently deal with multiple kinds of point clouds. According to experiments on object models, indoor scenes, and outdoor scans, 3QNet can achieve better compression performances than many representative methods.

  • Research Article
  • Cite Count Icon 7
  • 10.3390/s22218217
DOPNet: Achieving Accurate and Efficient Point Cloud Registration Based on Deep Learning and Multi-Level Features
  • Oct 27, 2022
  • Sensors (Basel, Switzerland)
  • Rongbin Yi + 5 more

Point cloud registration aims to find a rigid spatial transformation to align two given point clouds; it is widely deployed in many areas of computer vision, such as target detection, 3D localization, and so on. In order to achieve the desired results, registration error, robustness, and efficiency should be comprehensively considered. We propose a deep learning-based point cloud registration method, called DOPNet. DOPNet extracts global features of point clouds with a dynamic graph convolutional neural network (DGCNN) and cascading offset-attention modules, and the transformation is predicted by a multilayer perceptron (MLP). To enhance the information interaction between the two branches, the feature interaction module is inserted into the feature extraction pipeline to implement early data association. We compared DOPNet with the traditional method of using the iterative closest point (ICP) algorithm along with four learning-based registration methods on the Modelnet40 data set. In the experiments, the source and target point clouds were generated by sampling the original point cloud twice independently; we also conducted additional experiments with asymmetric objects. Further evaluation experiments were conducted with point cloud models from Stanford University. The results demonstrated that our DOPNet method outperforms these comparative methods in general, achieving more accurate and efficient point cloud registration.

  • Research Article
  • Cite Count Icon 27
  • 10.1016/j.patcog.2022.108784
Self-supervised rigid transformation equivariance for accurate 3D point cloud registration
  • May 10, 2022
  • Pattern Recognition
  • Zhiyuan Zhang + 5 more

Self-supervised rigid transformation equivariance for accurate 3D point cloud registration

Save Icon
Up Arrow
Open/Close